> My estimate is that within 2 to 3 years, the lowest cost way to generate AI compute will be in space.
This is so obviously false. For one thing, in what fantasy world would the ongoing operational and maintenance needs be 0?
No operational needs is obviously ... simplified. You still need to manage downlink capacity, station keeping, collision avoidance, etc. But for a large constellation the per-satellite cost of that would be pretty small.
The thing being called obvious here is that the maintenance you have to do on earth is vastly cheaper than the overspeccing you need to do in space (otherwise we would overspec on earth). That's before even considering the harsh radiation environment and the incredible cost to put even a single pound into low earth orbit.
Let's say given component failure rates, you can expect for 20% of the GPUs to fail in that time. I'd say that's acceptable.
A lot. As someone that has been responsible for trainings with up to 10K GPUs, things fail all the time. By all the time I don't mean every few weeks, I mean daily. From disk failings, to GPU overheating, to infiniband optical connectors not being correctly fastened and disconnecting randomly, we have to send people to manually fix/debug things in the datacenter all the time.
If one GPU fails, you essentially lose the entire node (so 8 GPUs), so if your strategy is to just turn off whatever fails forever and not deal with it, it's gonna get very expensive very fast.
And thats in an environment where temperature is very well controlled and where you don't have to put your entire cluster through 4 Gs and insane vibrations during take off.