zlacker

Largest GPU cluster at the moment is X.ai's 100K H100's which is ~$2.5B worth of GPUs. So, something 10x bigger (1M GPUs) is $25B, and add $10B for 1GW nuclear reactor.

This sort of $100-500B budget doesn't sound like training cluster money, more like anticipating massive industry uptake and multiple datacenters running inference (with all of corporate America's data sitting in the cloud).

replies(2): >>intern+Rs >>anonzz+lN

>>HarHar+(OP)
Shouldn't there be a fear of obsolescence?

replies(1): >>HarHar+Vu

>>intern+Rs
It seems you'd need to figure periodic updates into the operating cost of a large cluster, as well as replacing failed GPUs - they only last a few years if run continuously.

I've read that some datacenters run mixed generation GPUs - just updating some at a time, but not sure if they all do that.

It'd be interesting to read something about how updates are typically managed/scheduled.

>>HarHar+(OP)
Don't they say in the article that it is also for scaling up power and datacenters? That's the big cost here.

replies(1): >>HarHar+Yi1

>>anonzz+lN
There's the servers and data center infrastructure (cooling, electricity) as well as the GPUs of course, but if we're talking $10B+ of GPUs in a single datacenter, it seems that would dominate. Electricity generation is also a big expense, and it seems nuclear is the most viable option although multi-GW solar plants are possible too in some locations. The 1GW ~ $10B number I suggested is in the right ballpark.