zlacker

[parent] [thread] 3 comments
1. Coffee+(OP)[view] [source] 2025-06-27 10:56:40
It would be interesting to see if the failure rate across time holds true after a rocket launch and time spent in space. My guess is that it wouldn’t, but that’s just a guess.
replies(1): >>vidarh+06
2. vidarh+06[view] [source] 2025-06-27 12:01:39
>>Coffee+(OP)
I think it's likely the overall rate would be higher, and you might find you need more aggressive burn-in, but even then you'd need an extremely high failure rate before it's more efficient to replace components than writing them off.
replies(1): >>Mobius+cq
◧◩
3. Mobius+cq[view] [source] [discussion] 2025-06-27 15:06:05
>>vidarh+06
The bathtub curve isn’t the same for all components of a server though. Writing off the entire server because a single ram chip or ssd or network card failed would limit the entire server to the lifetime of the weakest part. I think you would want redundant hot spares of certain components with lower mean time between failures.
replies(1): >>vidarh+3t
◧◩◪
4. vidarh+3t[view] [source] [discussion] 2025-06-27 15:25:13
>>Mobius+cq
We do often write off an entire server because a single component fails because the lifetime of the shortest-lifetime components is usually long enough that even on-earth with easy access it's often not worth the cost to try to repair. In an easy-to-access data centre, the component most likely to get replaced would be hot-swappable drives or power supplies, but it's been about 2 decades since the last time I worked anywhere where anyone bothered to check for failed RAM or failed CPUs to salvage a server. And lot of servers don't have network devices you can replace without soldering, and haven't for a long time outside of really high end networking.

And at sufficient scale, once you plan for that it means you can massively simplify the servers. The amount of waste a sever case suitable for hot-swapping drives adds if you're not actually going to use the capability is massive.

[go to top]