zlacker

[parent] [thread] 4 comments
1. wtalli+(OP)[view] [source] 2022-10-02 19:19:25
> Furthermore, for some domains - e.g. storage - it's the only sane option.

Can you elaborate on this? Because failing storage is a common occurrence that usually does not warrant immediately crashing the whole OS, unless it's the root filesystem that becomes inaccessible.

replies(1): >>notaco+b6
2. notaco+b6[view] [source] 2022-10-02 20:02:00
>>wtalli+(OP)
Depends on what you mean by "failing storage" but IMX it does warrant an immediate stop (with or without reboot depending on circumstances). Yes, for some kinds of media errors it's reasonable to continue, or at least not panic. Another option in some cases is to go read-only. OTOH, if either media or memory corruption is detected, it would almost certainly be unsafe to continue because that might lead to writing the wrong data or writing it to the wrong place. The general rule in storage is that inaccessible data is preferable to lost, corrupted, or improperly overwritten data.

Especially in a distributed storage system using erasure codes etc., losing one machine means absolutely nothing even if it's permanent. On the last storage project I worked on, we routinely ran with 1-5% of machines down, whether it was due to failures or various kinds of maintenance actions, and all it meant was a loss of some capacity/performance. It's what the system was designed for. Leaving a faulty machine running, OTOH, could have led to a Byzantine failure mode corrupting all shards for a block and thus losing its contents forever.

BTW, in that sort of context - where most bytes in the world are held BTW - the root filesystem is more expendable than any other. It's just part of the access system, much like firmware, and re-imaging or even hardware replacement doesn't affect the real persistence layer. It's user data that must be king, and those media whose contents must be treated with the utmost care.

replies(1): >>wtalli+6b
◧◩
3. wtalli+6b[view] [source] [discussion] 2022-10-02 20:32:32
>>notaco+b6
I understand why a failing drive or apparently corrupt filesystem would be reason to freeze a filesystem. But that's nowhere close to kernel panic territory.

Even in a distributed, fault-tolerant multi-node system, it seems like it would be useful for the kernel to keep running long enough for userspace to notify other systems of the failure (eg. return errors to clients with pending requests so they don't have to wait for a timeout to try retrieving data from a different node) or at least send logs to where ever you're aggregating them.

replies(1): >>notaco+Ke
◧◩◪
4. notaco+Ke[view] [source] [discussion] 2022-10-02 20:55:11
>>wtalli+6b
In a system already designed to handle the sudden and possibly permanent loss of a single machine to hardware failure, those are nice to have at best. "Panic" doesn't have to mean not executing a single other instruction. Logging e.g. over the network is one of the things a system might do as part of its death throes, and definitely was for the last few such systems I worked on. What's important is that it not touch storage any more, or issue instructions to other machines to do so, or return any more possibly-corrupted data to other systems. For example, what if the faulty machine itself is performing block reconstruction when it realizes the world has turned upside down? Or if it returns a corrupted shard to another machine that's doing such reconstruction? In both of those scenarios the whole block could be corrupted even though that machine's local storage is no longer involved. I've seen both happen.

Since the mechanisms for ensuring the orderly stoppage of all such activity system-wide are themselves complicated and possibly error-prone, and more importantly not present in a commodity OS such as Linux, the safe option is "opt in" rather than "opt out". In other words, don't try to say you must stop X and Y and Z ad infinitum. Instead say you may only do A and B and nothing else. That can easily be accomplished with a panic, where certain parts such as dmesg are specifically enabled between the panic() call and the final halt instruction. Making that window bigger, e.g. to return errors to clients who don't really need them, only creates further potential for destructive activity to occur, and IMO is best avoided.

Note that this is a fundamental difference between a user (compute-centric) view of software and a systems/infra view. It's actually the point Linus was trying to get across, even if he picked a horrible example. What's arguably better in one domain might be professional malfeasance in the other. Given the many ways Linux is used, saying that "stopping is not an option" is silly, and "continuing is not an option" would be equally so. My point is not that what's true for my domain must be true for others, but that both really are and must remain options.

P.S. No, stopping userspace is not stopping everything, and not what I was talking about. Or what you were talking about until the narrowing became convenient. Your reply is a non sequitur. Also, I can see from other comments that you already agree with points I have made from the start - e.g. that both must remain options, that the choice depends on the system as a whole. Why badger so much, then? Why equivocate on the importance (or even meaningful difference) between kernel vs. userspace? Heightening conflict for its own sake isn't what this site is supposed to be about.

replies(1): >>wtalli+dh
◧◩◪◨
5. wtalli+dh[view] [source] [discussion] 2022-10-02 21:12:30
>>notaco+Ke
> "Panic" doesn't have to mean not executing a single other instruction.

We're talking specifically about the current meaning of a Linux kernel panic. That means an immediate halt to all of userspace.

[go to top]