zlacker

The question is what happens once you want to transition from your RTX 4090 to a business. It might be cute to generate 10 tokens per second or whatever you can get with whatever model you have to delight your family and friends. But once you want to scale that out into a genuine product - you're up against the ramp. Even a modest inference rig is going to cost a chunk of change in the hundreds of thousands. You have no real way to validate your business model without making some big investment.

Of course, it is the businesses that find a way to make this work that will succeed. It isn't an impossible problem, it is just a seemingly difficult one for now. That is why I mentioned VC funding as appearing to have more leverage over this market than previous ones. If you can find someone to foot the 250k+ cost (e.g. AI Grant [1] where they offer 250k cash and 350k cloud compute) then you might have a chance.

1. https://aigrant.org/

replies(1): >>pas+V11

>>zoogen+(OP)
You can use a lower performance model, you can use one LLM-as-a-service, etc.

If you want to compete on the actual model, then yes, this is not the time for garage shops.

If your business plan is so good, then it will work without H100 "cards" too, or if it's even better and you know it'll print money with H100 cards then great, just wait.