zlacker

[return to "Claude Code: connect to a local model when your quota runs out"]
1. paxys+c7c[view] [source] 2026-02-04 21:59:44
>>fugu2+(OP)
> Reduce your expectations about speed and performance!

Wildly understating this part.

Even the best local models (ones you run on beefy 128GB+ RAM machines) get nowhere close to the sheer intelligence of Claude/Gemini/Codex. At worst these models will move you backwards and just increase the amount of work Claude has to do when your limits reset.

◧◩
2. EagnaI+Zhd[view] [source] 2026-02-05 07:54:57
>>paxys+c7c
The secret is to not run out of quota.

Instead have Claude know when to offload work to local models and what model is best suited for the job. It will shape the prompt for the model. Then have Claude review the results. Massive reduction in costs.

btw, at least on Macbooks you can run good models with just M1 32GB of memory.

◧◩◪
3. BuildT+Ujd[view] [source] 2026-02-05 08:12:43
>>EagnaI+Zhd
I don't suppose you could point to any resources on where I could get started. I have a M2 with 64gb of unified memory and it'd be nice to make it work rather than burning Github credits.
◧◩◪◨
4. EagnaI+Jmd[view] [source] 2026-02-05 08:36:15
>>BuildT+Ujd
https://ollama.com

Although I'm starting to like LMStudio more, as it has more features that Ollama is missing.

https://lmstudio.ai

You can then get Claude to create the MCP server to talk to either. Then a CLAUDE.md that tells it to read the models you have downloaded, determine their use and when to offload. Claude will make all that for you as well.

[go to top]