zlacker

[return to "Claude Code: connect to a local model when your quota runs out"]
1. paxys+c7c[view] [source] 2026-02-04 21:59:44
>>fugu2+(OP)
> Reduce your expectations about speed and performance!

Wildly understating this part.

Even the best local models (ones you run on beefy 128GB+ RAM machines) get nowhere close to the sheer intelligence of Claude/Gemini/Codex. At worst these models will move you backwards and just increase the amount of work Claude has to do when your limits reset.

◧◩
2. seanmc+ADc[view] [source] 2026-02-05 01:27:40
>>paxys+c7c
> (ones you run on beefy 128GB+ RAM machines)

PC or Mac? A PC, ya, no way, not without beefy GPUs with lots of VRAM. A mac? Depends on the CPU, an M3 Ultra with 128GB of unified RAM is going to get closer, at least. You can have decent experiences with a Max CPU + 64GB of unified RAM (well, that's my setup at least).

◧◩◪
3. Quantu+CEc[view] [source] 2026-02-05 01:36:12
>>seanmc+ADc
Which models do you use, and how do you run them?
◧◩◪◨
4. seanmc+12d[view] [source] 2026-02-05 05:16:56
>>Quantu+CEc
I have a M3 max 64GB.

For VS Code code completion in Continue using a Qwen3-coder 7b model. For CLI work Qwen coder 32b for sidebar. 8 bit quant for both.

I need to take a look at Qwen3-coder-next, it is supposed to have made things much faster with a larger model.

[go to top]