zlacker

[return to "Claude Code: connect to a local model when your quota runs out"]
1. paxys+c7c[view] [source] 2026-02-04 21:59:44
>>fugu2+(OP)
> Reduce your expectations about speed and performance!

Wildly understating this part.

Even the best local models (ones you run on beefy 128GB+ RAM machines) get nowhere close to the sheer intelligence of Claude/Gemini/Codex. At worst these models will move you backwards and just increase the amount of work Claude has to do when your limits reset.

◧◩
2. zozbot+i8c[view] [source] 2026-02-04 22:05:11
>>paxys+c7c
The best open models such as Kimi 2.5 are about as smart today as the big proprietary models were one year ago. That's not "nothing" and is plenty good enough for everyday work.
◧◩◪
3. reilly+A9c[view] [source] 2026-02-04 22:11:11
>>zozbot+i8c
Which takes a $20k thunderbolt cluster of 2 512GB RAM Mac Studio Ultras to run at full quality…
◧◩◪◨
4. polyno+M8d[view] [source] 2026-02-05 06:23:52
>>reilly+A9c
Depending on what your usage requirements are, Mac Minis running UMA over RDMA is becoming a feasible option. At roughly 1/10 of the cost you're getting much much more than 1/10 the performance. (YMMV)

https://buildai.substack.com/i/181542049/the-mac-mini-moment

◧◩◪◨⬒
5. danw19+yNd[view] [source] 2026-02-05 12:26:54
>>polyno+M8d
I did not expect this to be a limiting factor in the mac mini RDMA setup ! -

> Thermal throttling: Thunderbolt 5 cables get hot under sustained 15GB/s load. After 10 minutes, bandwidth drops to 12GB/s. After 20 minutes, 10GB/s. Your 5.36 tokens/sec becomes 4.1 tokens/sec. Active cooling on cables helps but you’re fighting physics.

Thermal throttling of network cables is a new thing to me…

[go to top]