zlacker

[return to "Claude Code: connect to a local model when your quota runs out"]
1. paxys+c7c[view] [source] 2026-02-04 21:59:44
>>fugu2+(OP)
> Reduce your expectations about speed and performance!

Wildly understating this part.

Even the best local models (ones you run on beefy 128GB+ RAM machines) get nowhere close to the sheer intelligence of Claude/Gemini/Codex. At worst these models will move you backwards and just increase the amount of work Claude has to do when your limits reset.

◧◩
2. bityar+upc[view] [source] 2026-02-04 23:40:55
>>paxys+c7c
Correct, a rack full of datacenter equipment is not going to compete with anything that fits on your desk or lap. Well spotted.

But as a counterpoint: there are whole communities of people in this space who get significant value from models they run locally. I am one of them.

◧◩◪
3. Gravey+Lqc[view] [source] 2026-02-04 23:50:19
>>bityar+upc
Would you mind sharing your hardware setup and use case(s)?
◧◩◪◨
4. Camper+lrc[view] [source] 2026-02-04 23:55:28
>>Gravey+Lqc
Not the GP but the new Qwen-Coder-Next release feels like a step change, at 60 tokens per second on a single 96GB Blackwell. And that's at full 8-bit quantization and 256K context, which I wasn't sure was going to work at all.

It is probably enough to handle a lot of what people use the big-3 closed models for. Somewhat slower and somewhat dumber, granted, but still extraordinarily capable. It punches way above its weight class for an 80B model.

◧◩◪◨⬒
5. paxys+XPc[view] [source] 2026-02-05 03:13:51
>>Camper+lrc
"Single 96GB Blackwell" is still $15K+ worth of hardware. You'd have to use it at full capacity for 5-10 years to break even when compared to "Max" plans from OpenAI/Anthropic/Google. And you'd still get nowhere near the quality of something like Opus. Yes there are plenty of valid arguments in favor of self hosting, but at the moment value simply isn't one of them.
[go to top]