zlacker

[parent] [thread] 4 comments
1. HarHar+(OP)[view] [source] 2025-01-22 00:37:32
Largest GPU cluster at the moment is X.ai's 100K H100's which is ~$2.5B worth of GPUs. So, something 10x bigger (1M GPUs) is $25B, and add $10B for 1GW nuclear reactor.

This sort of $100-500B budget doesn't sound like training cluster money, more like anticipating massive industry uptake and multiple datacenters running inference (with all of corporate America's data sitting in the cloud).

replies(2): >>intern+Rs >>anonzz+lN
2. intern+Rs[view] [source] 2025-01-22 04:25:07
>>HarHar+(OP)
Shouldn't there be a fear of obsolescence?
replies(1): >>HarHar+Vu
◧◩
3. HarHar+Vu[view] [source] [discussion] 2025-01-22 04:46:54
>>intern+Rs
It seems you'd need to figure periodic updates into the operating cost of a large cluster, as well as replacing failed GPUs - they only last a few years if run continuously.

I've read that some datacenters run mixed generation GPUs - just updating some at a time, but not sure if they all do that.

It'd be interesting to read something about how updates are typically managed/scheduled.

4. anonzz+lN[view] [source] 2025-01-22 08:00:35
>>HarHar+(OP)
Don't they say in the article that it is also for scaling up power and datacenters? That's the big cost here.
replies(1): >>HarHar+Yi1
◧◩
5. HarHar+Yi1[view] [source] [discussion] 2025-01-22 12:47:17
>>anonzz+lN
There's the servers and data center infrastructure (cooling, electricity) as well as the GPUs of course, but if we're talking $10B+ of GPUs in a single datacenter, it seems that would dominate. Electricity generation is also a big expense, and it seems nuclear is the most viable option although multi-GW solar plants are possible too in some locations. The 1GW ~ $10B number I suggested is in the right ballpark.
[go to top]