zlacker

> Yeah I'm on the $20/mo Google plan and have been rate limited maybe twice in 2 months. Tried the equivalent Claude plan for a similar workload and lasted maybe 40 minutes before it asked me to upgrade to Max to continue.

The TLDR: The $20/40m cost is more reflective of what inference actually costs, including the amortised cost of the Capex, together with the Opex.

The Long Read:

I think the reason is because Anthropic is attempting to run inference at a profit and Google isn't.

Another reason could be that they don't own their cost centers (GPUs are from Nvidia, Cloud instances are from AWS, data centers from AWS, etc); they own only the model but rent everything else needed for inference so pay a margin for all those rented cost centers.

Google owns their entire vertical (GPUs are google-made, Cloud instances and datacenters are Google-owned, etc) and can apply vertical cost optimisations, so their final cost of inference is going to be much cheaper anyway even if they were not subsidising inference with their profits from unrelated business units.

replies(1): >>jckahn+Rw