Instead have Claude know when to offload work to local models and what model is best suited for the job. It will shape the prompt for the model. Then have Claude review the results. Massive reduction in costs.
btw, at least on Macbooks you can run good models with just M1 32GB of memory.
Although I'm starting to like LMStudio more, as it has more features that Ollama is missing.
You can then get Claude to create the MCP server to talk to either. Then a CLAUDE.md that tells it to read the models you have downloaded, determine their use and when to offload. Claude will make all that for you as well.
The big powerful models think about tasks, then offload some stuff to a drastically cheaper cloud model or the model running on your hardware.