I hope we find a path to at least fine-tuning medium sized models for prices that aren't outrageous. Even the tiny corp's tinybox [1] is $15k and I don't know how much actual work one could get done on it.
If the majority of startups are just "wrappers around OpenAI (et al.)" the reason is pretty obvious.
When I was at Rad AI we managed just fine. We took a big chunk of our seed round and used it to purchase our own cluster, which we setup at Colovore in Santa Clara. We had dozens, not hundreds, of GPUs and it set us back about half a million.
The one thing I can't stress enough- do not rent these machines. For the cost of renting a machine from AWS for 8 months you can own one of these machines and cover all of the datacenter costs- this basically makes it "free" from the eight month to three year mark. Once we decoupled our training from cloud prices we were able to do a lot more training and research. Maintenance of the machines is surprisingly easy, and they keep their value too since there's such a high demand for them.
I'd also argue that you don't need the H100s to get started. Most of our initial work was on much cheaper GPUs, with the A100s we purchased being reserved for training production models rapidly. What you need, and is far harder to get, is researchers who actually understand the models so they can improve the models themselves (rather than just compensating with more data and training). That was what really made the difference for Rad AI.
Even if I validate my idea on a RTX 4090, the path to scaling any idea gets expensive fast. 15k to move up to something like a tinybox (probably capable of running 65B model but is it realistic to train or fine-tune 65B model?). Then maybe $100k in cloud costs. Then maybe $500k in research sized cluster. Then $10m+ for enterprise grade. I don't see that kind of ramp happening outside well-financed VC startups.
To put it another way, the $10m+ for enterprise grade just seems wrong to me. It's more like $10m+ for mediocre responses to a lot of things. Rad AI didn't spend $10m on their models, but they absolutely are professional grade and are in use today.
I also think it's important to consider capital costs that are a one time thing, versus long term costs. Once you purchase that $10m cluster you have that forever, not just for a single model, and because of the GPU scarcity right now that cluster isn't losing value nearly as rapidly as most hardware does. If you purchase a $500k cluster, use it for three years, and then sell it for $400k you're really not doing all that bad.