zlacker

I know next to nothing about kernel programming, but I'm not sure here what Linus' objection to the comment he is responding to here is.

The comment seemed to be making reference to rust's safety guarantees about undefined behaviour like use after free.

Linus' seems to have a completely different definition of "safey" that conflates allocation failures, indexing out of bounds, and division by zero with memory safety. Rust makes no claims about those problems, and the comment clearly refers to undefined behaviour. Obviously, those other problems are real problems, but just not ones that Rust claims to solve.

Edit: Reading the chain further along, it increasingly feels like Linus is aruging against a strawman.

replies(3): >>pfortu+H >>arinle+d5 >>4bpp+R7

>>a_hume+(OP)
I am probably wrong but I understood that “safety meaning panic” is noeither “safe” not allowed in the Linux kernel because the kernel must not panic when an error arises.

replies(2): >>a_hume+12 >>rowanG+N4

>>pfortu+H
Which is why Rust has been accommodating the kernel by adding non-panic versions of the functions that Linus has been complaining about (namely that memory allocation is infallible, because that isn't an unreasonable thing to assume in applicationc code.). Still doesn't change the fact that "safe" in this context has a technical meaning, and what Linus is describing isn't that.

replies(2): >>Vt71fc+G4 >>layer8+C7

>>a_hume+12
maybe his point is that the technical meaning should use a more acurate word in his opinion?

replies(1): >>a_hume+O6

>>pfortu+H
Safety doesn't mean panic. I don't feel that was the point the person Linus responded to was making.

>>a_hume+(OP)
> I know next to nothing about kernel programming, but I'm not sure here what Linus' objection to the comment he is responding to here is.

You should read the email thread, as Linhas explains in clear terms.

Take for instance Linus's insightful followup post:

https://lkml.org/lkml/2022/9/19/1250

replies(1): >>ChrisS+26

>>arinle+d5
What is better: continuing to "limp along" in some unknown corrupted state (aka undefined behaviour) or in a well defined (albeit invalid) state?

replies(3): >>throw8+7b >>Someon+Ze >>yencab+U51

>>Vt71fc+G4
His point seems to be the opposite, that "safety" should have a vaguer meaning in his opinion, and not the well established technical definition that the author clearly meant when he used the word.

replies(1): >>LtWorf+8a

>>a_hume+12
The issue that Linus is probably coming from is that many Rust aficionados evangelize for Rust as if the very specific technical meaning of “safe” in Rust was the generic meaning of “safe”. For those who understand the limitations and the trade-offs, that can be quite tiresome.

replies(1): >>a_hume+S8

>>a_hume+(OP)
From a quick skim, it seems to me that at least in Linus's interpretation, his interlocutor is requesting changes to the way the kernel does things in order to accommodate/maintain Rust's "there is no undefined behaviour; in cases where circumstances conspire to make behaviour undefined, terminate immediately" philosophy even in kernel Rust code. He then figures that if he said he is not willing to do that, the other side would respond with something to the effect of "but implementing the Rust philosophy in full means you get safety, and you surely can't have a goal more important than that", and therefore leaps to talking down the importance of the safety that Rust actually guarantees, to argue that it is not actually so great that all other objectives would be secondary to it.

If his initial interpretation and expectation of the Rustacean response is in fact correct, the line of argumentation does not seem per se wrong, but I do think that it is bad practice in adversarial conversations to do the thing where you silently skip forward several steps in the argument and respond to what you expect (their response to your response)^n to be instead of the immediate argument at hand.

>>layer8+C7
Except, the person he is responding to doesn't make those claims - though I haven't read further up the chain - only downwards.

>>a_hume+O6
Or, in other words, rust-safety should mean what safety means in every other context, or rust people need to come up with a different word.

replies(1): >>a_hume+He

>>ChrisS+26
Had the same topic often on MCUs: limp along to hopefully get the error out somehow, otherwise it won't be noticed if not with JTAG debugger attached (default in field).

So I can understand where Linus comes from.

replies(2): >>gmueck+qp >>mlindn+Ep

>>LtWorf+8a
You don't get to change the definition of a term used by another when it had a clear meaning in its use, and then make an arugment on the basis that the author meant y when they clearly meant x. That is just conflation.

replies(2): >>LtWorf+Wk >>Vt71fc+ar

>>ChrisS+26
This question is answered in Linus' emails fully and better than I'm going to do.

But to restate briefly, the answer varies wildly between kernel and user programs, because a user program failing hard on corrupt state is still able to report that failure/bug, whereas a kernel panic is a difficult to report problem (and breaks a bunch of automated reporting tooling).

So in answer: Read the discussion.

replies(1): >>ChrisS+Ji

>>Someon+Ze
You seem to have misunderstood me. The distinction I'm making is not between kernel panic or undefined behaviour. The distinction is between undefined behaviour and defined behaviour. That defined behaviour can be anything, even including "limping on" somehow.

>>a_hume+He
I think the word "safety" existed before rust…

replies(1): >>a_hume+Xq

>>throw8+7b
Yes. You could still hard reset after the error is reported if you wanted to. And if system availability matters, a hardware watchdog would handle the case where the error handling doesn't finish.

>>throw8+7b
Limping along is what the salesman and the business people want as failures look bad.

Engineers should want the immediate stop, because that's safer, especially in safety critical situations.

replies(3): >>wtalli+Kt >>warinu+pf1 >>niscoc+6X1

>>LtWorf+Wk
This has nothing to do with the common definition of "safety".Terms change their meaning based upon their use and context. The author has a clear use in mind - memory safety.

The rules of arugment existed long before the linux kernel. You don't get to change terms introduced within a arugment with a clear meaning because it helps you create a strawman. If you want to change the definition of a term mid arugment, you telegraph it. Once again, this is called conflation.

>>a_hume+He
>when it had a clear meaning in its use

Thats not the issue though. It's that "safe" means something is actualy safe. My house isn't safe if its on fire, even if the house is in a safe neighborhood. Linus' claim is that "rust people" sometimes themselves conflate memory saftey with general code saftey, simply because "safe" is in the name. So much so that they will at times sacrifice code quality to achieve this goal despite (a) memory saftey not being real saftey and (b) there is no way to guarantee memory saftey in the kernel anyway. What he is saying is that "rust people" (whatever that means) are at times trading off real saftey or real code maintenance/performance for "rust saftey."

>a compiler - or language infrastructure - that says "my rules are so ingrained that I cannot do that" is not one that is valid for kernel work.

And

>I think you are missing just how many things are "unsafe" in certain contexts and cannot be validated.

>This is not some kind of "a few special things".

>This is things like absolutely _anything_ that allocates memory, or takes a lock, or does a number of other things.

>Those things are simply not "safe" if you hold a spinlock, or if you are in a RCU read-locked region.

>And there is literally no way to check for it in certain configurations. None.

You can judge wheter he is correct but he never said rust's saftey implies absolute saftey, only that some rust users are treating it that way by sacrificing the code for it. If that's the case then it makes a lot of sense to start using a more sensible word like "guaranteed" instead of safe. I think part of what contibutes to this idea is that "unsafe" code is written with the keyword "unsafe" as if code written not that way is safe, and code written with "unsafe" is bad. That's not to say that "unsafe" actually implies any of that - all it means is that it's not guaranteed to be memory safe - but according to Linus it creates a certain mentality which is incongruent with the nature of kernel development. And the reason for that is that safe and unsafe are general english words with strong connotations such as:

>protected from or not exposed to danger or risk; not likely to be harmed or lost.

>uninjured; with no harm done.

And for unsafe:

>able or likely to cause harm, damage, or loss

>>mlindn+Ep
The kernel is not the whole system. The kernel needs to offer the "limping along" option so that the other parts of the system can implement whatever graceful failure method is appropriate for that system. There's no one size fits all solution for the kernel to pick.

>>ChrisS+26
What is better for a desktop user:

1) needing to reload a wifi driver to reinitialize hardware (with a tiny probability of memory corruption) OR choosing to reboot as soon as convenient (with a tiny probability of corrupting the latest saved files)

2) to lose unsaved files for sure and not even know what caused the crash

replies(2): >>Jweb_G+981 >>notaco+LP2

>>yencab+U51
The latter, because the "tiny probability of memory corruption" can easily become a CVE.

replies(1): >>P5fRxh+Gh1

>>mlindn+Ep
You sound like you code websites or something.

Real engineers, like say the people who code the machines that fly in mars, don't want "oops that's unexpected, ruin the entire mission because that's safer". Same for the Linux kernel.

>>Jweb_G+981
We have a term for this.

FUD

replies(1): >>Jweb_G+al1

>>P5fRxh+Gh1
Linux has numerous CVEs, and a large percentage stem from memory corruption. That's not FUD, I'm afraid.

replies(1): >>scoutt+C72

>>mlindn+Ep
What are you talking about? Should planes stop flying when they encounter an error?

Safety critical systems will try to recover to a working state as much as possible. It is designed with redundancy that if one path fails, it can use path 2 or path 3 towards a safe usable state.

>>Jweb_G+al1
It's FUD. And not only that. The fear of constantly being attacked by an external entity is also paranoic.

replies(1): >>Jweb_G+til

>>yencab+U51
Why focus exclusively on the desktop, or over-generalize from it to other uses? What is appropriate for them is not necessarily so for the many millions of machines in server rooms and data centers. Also, you present a false dichotomy. "Lose unsaved files for sure" is not the case for many systems, and "not even know" is not necessarily the case. Logging during shutdown is a real thing, as is saving a crash dump for retrieval after reboot. Both have been standard at my last several projects and companies.

As I've said over and over, both approaches - "limp along" and "reboot before causing harm" - need to remain options, for different scenarios. Anyone who treats the one use case they're familiar with as the only one which should drive policy for everyone is doing the community a disservice.

replies(1): >>yencab+hW2

>>notaco+LP2
Yes, both need to remain options. Rust-in-kernel needs to be able to support both. That's like half of Linus's ranting there.

The other half is that kernel has a lot of rules of what is safe to be done where, and Rust has to be able to follow those rules, or not be used in those contexts. This is the GFP_ATOMIC part.

>>scoutt+C72
Unfortunately, whether you personally care about this sort of thing isn't good enough anymore. Owned Linux boxes on IoT devices are now being marshaled into massive botnets used to perform denial of service attacks, while other vulnerabilities are exploited to enable ransomware. You having negligent security on your own unpatched box because you don't personally feel like it's a good tradeoff has many negative external consequences. Fortunately, the decision isn't actually up to you (and having fewer vulnerabilities won't influence you negatively anyway, so I'm not sure why you're so angry about it).

replies(1): >>scoutt+Guo

>>Jweb_G+til
> why you're so angry about it

Am I?

You suppose a lot of things about me from literaly a bunch of words.

"A 'tiny probability of memory corruption' can easily become a CVE" is still FUD, because is simply not true in most cases. The words "tiny" and "easily" show the bias here.

The rest of the conversation seems a symptom of Hypervigilance: Fixation on potential threats (dangerous people, animals, or situations).

Fortunately, the decision isn't up to you either.