zlacker

[return to "HN is up again"]
1. sillys+z[view] [source] 2022-07-08 20:34:23
>>tpmx+(OP)
HN was down because the failover server also failed: https://twitter.com/HNStatus/status/1545409429113229312

Double disk failure is improbable but not impossible.

The most impressive thing is that there seems to be no dataloss, almost whatsoever. Whatever the backup system is, it seems rock solid.

◧◩
2. davedu+b2[view] [source] 2022-07-08 20:41:05
>>sillys+z
> Double disk failure is improbable but not impossible.

It's not even improbable if the disks are the same kind purchased at the same time.

◧◩◪
3. spiffy+c7[view] [source] 2022-07-08 21:00:26
>>davedu+b2
Yep: if you buy a pair disks together, there's a fair chance they'll both be from the same manufacturing batch, which correlates with disk defects.
◧◩◪◨
4. clinto+4a[view] [source] 2022-07-08 21:14:24
>>spiffy+c7
This makes total sense but I've never heard of it. Is there any literature or writing about this phenomenon?

I guess proper redundancy is having different brands of equipment also in some cases.

◧◩◪◨⬒
5. toast0+8k[view] [source] 2022-07-08 21:53:32
>>clinto+4a
I also don't know about literature on this phenomenon, but i recall HP had two different SSD recalls because when the uptime counter rolled over, they would fail. That's not even load dependent, just did you get a batch and power them on all at the same time. Uptime is too high causing issues isn't that unusual for storage, unfortunately.

It's not always easy, but if you can, you want manufacturer diversity, batch diversity, maybe firmware version diversity[1], and power on time diversity. That adds a lot of variables if you need to track down issues though.

[1] you don't want to have versions with known issues that affect you, but it's helpful to have different versions to diagnose unknown issues.

[go to top]