xAI joins SpaceX

>>g-mork+(OP)
I still don't understand the "data center in space" narrative. How are they going to solve the cooling issue?

>>Saline+v3
Cooling a datacenter in space isn't really any harder than cooling a starlink in space, the ratio of solar panels to radiating area will have to be about the same. There is nothing uniquely heat-producing about GPUs, ultimately almost all energy collected by a satellite's solar panels ends up as heat in the satellite.

IMO the big problem is the lack of maintainability.

>>sebzim+94
I think that it's not just about the ratio. To me the difference is that Starlink sattelites are fixed-scope, miniature satellites that perform a limited range of tasks. When you talk about GPUs, though, your goal is maximizing the amount of compute you send up. Which means you need to push as many of these GPUs up there as possible, to the extent where you'd need huge megastructures with solar panels and radiators that would probably start pushing the limits of what in-space construction can do. Sure, the ratio would be the same, but what about the scale?

And you also need it to make sense not just from a maintenance standpoint, but from a financial one. In what world would launching what's equivalent to huge facilities that work perfectly fine on the ground make sense? What's the point? If we had a space elevator and nearly free space deployment, then yeah maybe, but how does this plan square with our current reality?

Oh, and don't forget about getting some good shielding for all those precise, cutting-edge processors.

>>tavave+97
Why would you need to fit the GPUs all in one structure?

You can have a swarm of small, disposable satellites with laser links between them.

>>pantal+9a
Because that brings in the whole distributed computing mess. No matter how instantaneous the actual link is, you still have to deal with the problems of which satellites can see one another, how many simultaneous links can exist per satellite, the max throughput, the need for better error correction and all sorts of other things that will drastically slow the system down in the best case. Unlike something like Starlink, with GPUs you have to be ready that everyone may need to talk to everyone else at the same time while maintaining insane throughput. If you want to send GPUs up one by one, get ready to also equip each satellite with a fixed mass of everything required to transmit and receive so much data, redundant structural/power/compute mass, individual shielding and much more. All the wasted mass you have to launch with individual satellites makes the already nonsensical pricing even worse. It just makes no sense when you can build a warehouse on the ground, fill it with shoulder-to-shoulder servers that communicate in a simple, sane and well-known way and can be repaired on the spot. What's the point?

>>tavave+yd
Isn't this already a major problem for AI clusters?

I vaguely recall an article a while ago about the impact of GPU reliability: a big problem with training is that the entire cluster basically operates in lock-step, with each node needing the data its neighbors calculated during the previous step to proceed. The unfortunate side-effect is that any failure stops the entire hundred-thousand-node cluster from proceeding - as the cluster grows even the tiniest failure rate is going to absolutely ruin your uptime. I think they managed to somehow solve this, but I have absolutely no idea how they managed to do it.

zlacker