zlacker

The way to handle this is split up kernel work into fail-able tasks [1]. When a safety check (like array OOB) occurs, it unwinds the stack up to the start of the task, and the task fails.

Linus sounds so ignorant in this comment. As if no one else thought of writing safety-critical systems in a language that had dynamic errors, and that dynamic errors are going to bring the whole system down or turn it into a brick. No way!

Errors don't have to be full-blown exceptions with all that rigamarole, but silently continuing with corruption is utter madness and in 2022 Linus should feel embarrassed for advocating such a backwards view.

[1] This works for Erlang. Not everything needs to be a shared-nothing actor, to be sure, but failing a whole task is about the right granularity to allow reasoning about the system. E.g. a few dozen to a few hundred types of tasks or processes seems about right.

replies(3): >>jmilli+I >>fritol+H5 >>mike_h+I8

>>titzer+(OP)
I think Linus's response would be that those failable tasks are called "processes", and the low-level supervisor that starts + monitors them is the kernel. If you have code that might fail and restart, it belongs in userspace.

If you want to run an Erlang-style distributed system in the kernel then that's an interesting research project, but it isn't where Linux is today. You'd be better off starting with SeL4 or Fuchsia.

replies(1): >>titzer+k1

>>jmilli+I
40 years of microkernels, of which I know Linus is aware of, beg to differ. Maybe Linus's extreme opposition to microkernels, ostensibly because they have historically a little lower performance--I dunno--but my comment should not be read as "yes, you must have a microkernel". There are cheaper fault isolation mechanisms than full-blown separate processes. Just having basic stack unwinding and failing a task would be a start.

replies(5): >>pca006+M2 >>jstimp+t3 >>Wastin+b4 >>geertj+c6 >>lr1970+UK

>>titzer+k1
I think it might be a bit more complicated than that, considering you can have static data and unwinding the stack will not reset those states. I guess you still need some sort of task level abstraction and reset all the data for that task when unwinding from it. Btw, do we need stack unwinding or can we just do a sjlj?

replies(1): >>titzer+H3

>>titzer+k1
How do you unwind if most of your kernel is written in C? (answering my own question - they are doing stack unwinding - only manually).

Where do you unwind to if memory is corrupted?

I don't think we're talking about what would be exception handling in other languages. I believe it's asserts. How do userland processes handle a failed assertion? Usually the process is terminated, but giving a debugger the possibility to examine the state first, or dumping core.

And that's similar to what they are doing in the kernel. Only in that in the kernel, it's more dangerous because there is limited process / task isolation. I think that is an argument that taking down "full-blown separate processes" might not even be enough in the kernel.

>>pca006+M2
Note I am not going to advocate for try ... catch .. finally, because I think that language mechanism is abused out the wazoo for handling all kinds of things, like IO errors, but this is exactly what try ... finally would be for.

Regardless, I think just killing the task instantly, even with partial updates to memory, would be totally fine. It'd be cheap, as automatically undoing the updates (effectively a transaction rollback) is still too expensive. Software transactional memory just comes with too much overhead.

I vote "kill and unwind" and then dealing with partial updates has to be left to a higher level.

>>titzer+k1
Sorry it’s hard to take you seriously after that.

Linux isn’t a microkernel. If you want to work on a microkernel, go work on Fuchsia. It’s interesting research but utterly irrelevant to the point at hand.

Anyway, the microkernel discussion has been happening for three decades now. They haven’t historically had a little lower performance. They had garbage performance, to the point of being unsuitable in the 90s.

Plenty of kernel code can’t be written as to be unwindable. That’s the issue at hand. In a fantasy world, it might have been written as such but it’s not the world will live in which is what matters to Linus.

replies(3): >>jeffre+Kl >>pjmlp+ho >>rleigh+tF

>>titzer+(OP)
What if the task was invoked asynchronously (and maybe it keeps happening.) What does async stack unwinding entail in Rust? Is there a parent-child relationship between invoker and invokee? async scopes (https://rust-lang.github.io/wg-async/vision/roadmap/scopes.h...) ? I've not touched Rust at all.

>>titzer+k1
> Just having basic stack unwinding and failing a task would be a start.

As the sibling comment pointed out, if you extend this idea to clean up all state, you end up with processes.

I do have some doubt on the no panic rule. But instead of emulating processes in the kernel, I’d see a firmware like subsystem whose only job it is to export core dumps from the local system, after which the kernel is free to panic.

As a general point and in my view, and I agree this is an appeal to authority, Linus has this uncanny ability to find compromises between practicality and theory that result in successful real world software. He’s not always right but he’s almost never completely wrong.

>>titzer+(OP)
It doesn't continue silently, it warns. More accurately, it does what you tell it to, which can also be a hard stop if you want to.

It's up to you to choose the right failure strategy and monitor your system if you don't want to panic, and take appropriate measures and not just ignore the warning.

It's not Linus who sounds ignorant here, it's the people applying user-space "best practices" to the kernel. If the kernel panics, the system is dead and you've lost the opportunity to diagnose the problem, which may be non-deterministic and hard to trigger on purpose.

replies(1): >>jeffre+om

>>Wastin+b4
Well QNX is running on a gazillion of devices, even resource restricted ones, without problems. It can be slower but it does not have to always. That is gar from being a fantasy world.

>>mike_h+I8
I agree with your statements, but I wonder: who is warned typically? An end user via a log he neither reads nor understands? The chance that this will lead to the right measure is low, isn't it.

replies(1): >>Gibbon+7U

>>Wastin+b4
QNX and INTEGRITY customers will be to differ.

>>Wastin+b4
Others have mentioned QNX. There is also ThreadX, which is a "picokernel". Both are certified for use in safety-critical domains. There are other options as well. Segger do one, for example, and there's also SafeRTOS, and others.

"Performance" is a red herring. In a safety-critical system, what matters is the behaviour and the consistency. ThreadX provides timing guarantees which Linux can not, and all of the system threads are executed in strict priority order. It works extremely well, and the result is a system for which one can can understand the behaviour exactly, which is important for validating that it is functioning correctly. Simplicity equates to reliability. It doesn't matter if it's "slow" so long as it's consistently slow. If it meets the product requirements, then it's fine. And when you do the board design, you'll pick a part appropriate to the task at hand to meet the timing requirements.

Anyway, systems like ThreadX provide safety guarantees that Linux will never be able to. But the interface is not POSIX. And for dedicated applications that's OK. It's not a general-purpose OS, and that's OK too. There are good reasons not to use complex general-purpose kernels in safety-critical systems.

IEC 62304 and ISO 13485 are serious standards for serious applications, where faults can be life-critical. You wouldn't use Linux in this context. No matter how much we might like Linux, you wouldn't entrust your life to it, would you? Anyone who answered "yes" to that rhetorical question should not be trusted with writing safety-critical applications. Linux is too big and complex to fully understand and reason about, and as a result impossible to validate properly in good faith. You might use it in an ancillary system in a non-safety-critical context, but you wouldn't use it anywhere where safety really mattered. IEC 62304 is all about hazards and risks, and risk mitigation. You can't mitigate risks you can't fully reason about, and any given release of Linux has hundreds of silly bugs in it on top of very complex behaviours we can't fully understand either even if they are correct.

replies(1): >>Wastin+EN

>>titzer+k1
> 40 years of microkernels, of which I know Linus is aware of, beg to differ.

For better or worse Linux is NOT a microkernel. Therefore, the sound microkernel wisdom is not applicable to Linux in its present form. The "impedance match" of any new language added to the linux kernel is driven by what current kernel code in C is doing. This is essentially linux kernel limitation. If Rust cannot adapt to these requirements it is a mismatch for linux kernel development. For the other kernels like Fuchsia Rust is a good fit. BTW, core Fuchsia kernel itself is still in C++.

>>rleigh+tF
Sorry, I’m a bit lost regarding your comment. The discussion was about code safety in Linux in the context of potentially introducing Rust. I don’t really see the link with microkernels in the context of safety oriented RTOS. I think you are reacting to my comment about microkernels performance in the 90s which I maintain.

Neither QNX nor ThreadX are intended to be general purpose kernel. I haven’t looked into it for a long time but QNX performances used to not be very good. It’s small. It can boot fast. It gives you guaranty regarding time of return. Everything you want from a RTOS in a safety critical environment. It’s not very fast however which is why it never tried to move towards the general market.

>>jeffre+om
The couple of times I had to go digging into the kernel what the thing looks like to me is a very large bare metal piece of firmware. As someone who writes firmware that very last thing you ever want is it to hang or reset without reporting any diagnostics. Because you have no idea where the offending code is. I'll belabor the point for people that think a large program is a few thousand lands. With the kernel it's millions of lings of code mostly written by other people.

Small rant. ARM cortex processors overwrites the stack pointer on reset. That's very very very dumb because after the watchdog trips you have no idea what the code was doing. Which means you can't report what the code was doing when that happened.