As I posted up in this thread, the right way to handle this is to make dynamic errors either throw exceptions or kill the whole task, and split the critical work into tasks that can be as-a-whole failed or completed, almost like transactions. The idea that the kernel should just go on limping in a f'd up state is bonkers.
I feel like we must have read two different articles. You sound crazy. Didn't read it your way at all.
> Think of that "debugging tools give a huge warning" as being the equivalent of std::panic in standard rust. Yes, the kernel will continue (unless you have panic-on-warn set), because the kernel MUST continue in order for that "report to upstream" to have a chance of happening.
"If the kernel shuts down the world, we don't get the bug report", seems like a pretty good argument. There are two options when you hit a panic in rust code:
* Panic and shut it all down. This prevents any reporting mechanism like a core dump. You cannot attach a normal debugger to the kernel.
* Ignore the panic and proceed with the information it failed, reporting this failure later.
The kernel is a single program, so it's not like you could just fork it before every Rust call and fail if they fail.
> In the kernel, "panic and stop" is not an option (it's actively worse than even the wrong answer, since it's really not debugable), so the kernel version of "panic" is "WARN_ON_ONCE()" and continue with the wrong answer.
(edit, and):
> Yes, the kernel will continue (unless you have panic-on-warn set), because the kernel MUST continue in order for that "report to upstream" to have a chance of happening.
Did I read that right? The kernel must continue? Yes, sure, absolutely...but maybe it doesn't need to continue with the next instruction, but maybe in an error handler? Is his thinking so narrow? I hope not.
> * Panic and shut it all down. This prevents any reporting mechanism like a core dump. You cannot attach a normal debugger to the kernel.
No one is really advocating that. Clearly you need to be able to write code that fails at a smaller granularity than the whole kernel. See my comment upthread about what I mean by that: dynamic errors fail smaller granularity tasks and handlers deal with tasks failing due to safety checks going bad.
> dynamic errors fail smaller granularity tasks and handlers deal with tasks failing due to safety checks going bad.
Yes and that's why Rust is bad here (but it doesn't have to be). Rust _forces_ you to stop the whole world when an error occurs. You cannot fail at a smaller granularity. You have to panic. Period. This is why it is being criticized here. It doesn't allow you any other granularity. The top comment has some alternatives that still work in Rust.
Rust needs to fix that then. So we agree on that.
The issue being discussed here is that Rust comes from a perspective of being able to classify errors and being able to automate error handling. In the kernel, it doesn't work like that, as we're working with more constraints than in userland. That includes hardware that doesn't behave like it was expected to.
There is an enormous variation in output targets for a panic on Linux: graphics hardware attached to PCIe (requires graphics driver and possibly support from PCIe bus master, I don't know), serial interface (USART driver), serial via USB (serial over USB driver, USB protocol stack, USB root hub driver, whatever bus that is attached to)... There is a very real chance that the error reporting ends up encountering the same issue (e.g. some inconsistent data on the kernel heap) while reporting it, Which would leave the developers with no information to work from if the kernel traps itself in an endless error handling loop.
But... this isn't true??