Tell HN: I cut Claude API costs from $70/month to pennies

>>ok_orc+(OP)
It sounds like you don’t need immediate llm responses and can batch process your data nightly? Have you considered running a local llm? May not need to pay for api calls. Today’s local models are quite good. I started off with cpu and even that was fine for my pipelines.

>>LTL_FT+mC
Can you suggest any good llms for cpu?

>>ydu1a2+9i1
Following.

>>R_D_Ol+dK1
I started off using gpt-oss-120b on cpu. It uses about 60-65gb of memory or so but my workstation has 128gb of ram. If I had less ram, I would start off with the gpt-oss-20b model and go from there. Look for MoE models as they are more efficient to run.

zlacker