“Rust is safe” is not some kind of absolute guarantee of code safety

>>rvz+(OP)
As usual HN comments react to the headline, without reading the content.

A lot of modern userspace code, including Rust code in the standard library, thinks that invariant failures (AKA "programmer errors") should cause some sort of assertion failure or crash (Rust or Go `panic`, C/C++ `assert`, etc). In the kernel, claims Linus, failing loudly is worse than trying to keep going because failing would also kill the failure reporting mechanisms.

He advocates for a sort of soft-failure, where the code tells you you're entering unknown territory and then goes ahead and does whatever. Maybe it crashes later, maybe it returns the wrong answer, who knows, the only thing it won't do is halt the kernel at the point the error was detected.

Think of the following Rust API for an array, which needs to be able to handle the case of a user reading an index outside its bounds:

  struct Array<T> { ... }
  impl<T> Array<T> {
    fn len(&self) -> usize;

    // if idx >= len, panic
    fn get_or_panic(&self, idx: usize) -> T;

    // if idx >= len, return None
    fn get_or_none(&self, idx: usize) -> Option<T>;

    // if idx >= len, print a stack trace and return
    // who knows what
    unsafe fn get_or_undefined(&self, idx: usize) -> T;
  }

The first two are safe by the Rust definition, because they can't cause memory-unsafe behavior. The second two are safe by the Linus/Linux definition, because they won't cause a kernel panic. If you have to choose between #1 and #3, Linus is putting his foot down and saying that the kernel's answer is #3.

>>jmilli+Fb
The way to handle this is split up kernel work into fail-able tasks [1]. When a safety check (like array OOB) occurs, it unwinds the stack up to the start of the task, and the task fails.

Linus sounds so ignorant in this comment. As if no one else thought of writing safety-critical systems in a language that had dynamic errors, and that dynamic errors are going to bring the whole system down or turn it into a brick. No way!

Errors don't have to be full-blown exceptions with all that rigamarole, but silently continuing with corruption is utter madness and in 2022 Linus should feel embarrassed for advocating such a backwards view.

[1] This works for Erlang. Not everything needs to be a shared-nothing actor, to be sure, but failing a whole task is about the right granularity to allow reasoning about the system. E.g. a few dozen to a few hundred types of tasks or processes seems about right.

>>titzer+cf
It doesn't continue silently, it warns. More accurately, it does what you tell it to, which can also be a hard stop if you want to.

It's up to you to choose the right failure strategy and monitor your system if you don't want to panic, and take appropriate measures and not just ignore the warning.

It's not Linus who sounds ignorant here, it's the people applying user-space "best practices" to the kernel. If the kernel panics, the system is dead and you've lost the opportunity to diagnose the problem, which may be non-deterministic and hard to trigger on purpose.

zlacker