xAI joins SpaceX

>>g-mork+(OP)
> The basic math is that launching a million tons per year of satellites generating 100 kW of compute power per ton would add 100 gigawatts of AI compute capacity annually, with no ongoing operational or maintenance needs. Ultimately, there is a path to launching 1 TW/year from Earth.

> My estimate is that within 2 to 3 years, the lowest cost way to generate AI compute will be in space.

This is so obviously false. For one thing, in what fantasy world would the ongoing operational and maintenance needs be 0?

>>rybosw+u5
You operate them like Microsoft's submerged data center project: you don't do maintenance, whatever fails fails. You start with enough redundancy in critical components like power and networking and accept that compute resources will slowly decrease as nodes fail

No operational needs is obviously ... simplified. You still need to manage downlink capacity, station keeping, collision avoidance, etc. But for a large constellation the per-satellite cost of that would be pretty small.

>>wongar+z8
How do you make a small fortune? Start with a big one.

The thing being called obvious here is that the maintenance you have to do on earth is vastly cheaper than the overspeccing you need to do in space (otherwise we would overspec on earth). That's before even considering the harsh radiation environment and the incredible cost to put even a single pound into low earth orbit.

>>willis+K9
How much maintenance do you need? Lets say you have hardware whose useful lifespan due to obsolescence is 5 years, and in 4, the satellite will crash into the atmosphere anyways.

Let's say given component failure rates, you can expect for 20% of the GPUs to fail in that time. I'd say that's acceptable.

>>torgin+Kh
> How much maintenance do you need?

A lot. As someone that has been responsible for trainings with up to 10K GPUs, things fail all the time. By all the time I don't mean every few weeks, I mean daily. From disk failings, to GPU overheating, to infiniband optical connectors not being correctly fastened and disconnecting randomly, we have to send people to manually fix/debug things in the datacenter all the time.

If one GPU fails, you essentially lose the entire node (so 8 GPUs), so if your strategy is to just turn off whatever fails forever and not deal with it, it's gonna get very expensive very fast.

And thats in an environment where temperature is very well controlled and where you don't have to put your entire cluster through 4 Gs and insane vibrations during take off.

zlacker