And you also need it to make sense not just from a maintenance standpoint, but from a financial one. In what world would launching what's equivalent to huge facilities that work perfectly fine on the ground make sense? What's the point? If we had a space elevator and nearly free space deployment, then yeah maybe, but how does this plan square with our current reality?
Oh, and don't forget about getting some good shielding for all those precise, cutting-edge processors.
You can have a swarm of small, disposable satellites with laser links between them.
And for data centers, the satellite wouldn't be as far apart as starlight satellites, they would be quite close instead.
And a single cluster today would already require more solar & cooling capacity than all starlink satellites combined.
I vaguely recall an article a while ago about the impact of GPU reliability: a big problem with training is that the entire cluster basically operates in lock-step, with each node needing the data its neighbors calculated during the previous step to proceed. The unfortunate side-effect is that any failure stops the entire hundred-thousand-node cluster from proceeding - as the cluster grows even the tiniest failure rate is going to absolutely ruin your uptime. I think they managed to somehow solve this, but I have absolutely no idea how they managed to do it.