Claude Code: connect to a local model when your quota runs out

>>fugu2+(OP)
> Reduce your expectations about speed and performance!

Wildly understating this part.

Even the best local models (ones you run on beefy 128GB+ RAM machines) get nowhere close to the sheer intelligence of Claude/Gemini/Codex. At worst these models will move you backwards and just increase the amount of work Claude has to do when your limits reset.

>>paxys+c7c
> (ones you run on beefy 128GB+ RAM machines)

PC or Mac? A PC, ya, no way, not without beefy GPUs with lots of VRAM. A mac? Depends on the CPU, an M3 Ultra with 128GB of unified RAM is going to get closer, at least. You can have decent experiences with a Max CPU + 64GB of unified RAM (well, that's my setup at least).

>>seanmc+ADc
Which models do you use, and how do you run them?

>>Quantu+CEc
I have a M3 max 64GB.

For VS Code code completion in Continue using a Qwen3-coder 7b model. For CLI work Qwen coder 32b for sidebar. 8 bit quant for both.

I need to take a look at Qwen3-coder-next, it is supposed to have made things much faster with a larger model.

zlacker