zlacker

Doesn't this depend a lot on your application though? Not every workload needs low latency and massive horizontal scalability.

Take their example of running the llm over the 2 million recipes and saving $23k over GPT 4. That could easily be 2 million documents in some back end system running in a batch. Many people would wait a few days or weeks for a job like that to finish if it offered significant savings.

replies(1): >>minima+S

>>hereon+(OP)
That's more of a fair use case.

It though also demonstrates why the economics are complicated and there's no one-size-fits-all.