Wildly understating this part.
Even the best local models (ones you run on beefy 128GB+ RAM machines) get nowhere close to the sheer intelligence of Claude/Gemini/Codex. At worst these models will move you backwards and just increase the amount of work Claude has to do when your limits reset.
Instead have Claude know when to offload work to local models and what model is best suited for the job. It will shape the prompt for the model. Then have Claude review the results. Massive reduction in costs.
btw, at least on Macbooks you can run good models with just M1 32GB of memory.
The big powerful models think about tasks, then offload some stuff to a drastically cheaper cloud model or the model running on your hardware.