> You need to realize that
> (a) reality trumps fantasy
?
The comment seemed to be making reference to rust's safety guarantees about undefined behaviour like use after free.
Linus' seems to have a completely different definition of "safey" that conflates allocation failures, indexing out of bounds, and division by zero with memory safety. Rust makes no claims about those problems, and the comment clearly refers to undefined behaviour. Obviously, those other problems are real problems, but just not ones that Rust claims to solve.
Edit: Reading the chain further along, it increasingly feels like Linus is aruging against a strawman.
Wouldn't be that sure about that. Getting the wrong answer can be a serious security problem. Not completing the operation... well, it is not good, but that's it.
Linus may view his job as "Saying No" but the way he does it still leaves a little to be desired, because his reasoning is sound here, but it's less "Follow my reasoning" than "You don't want to get yelled at again do you?"
[0]: https://lore.kernel.org/lkml/CAFRnB2VPpLSMqQwFPEjZhde8+-c6LL...
Have you missed the years and years of people criticizing Linus for his communication style?
When people say "safe" there's a pretty precise meaning and it's not this.
Yes, anyone who believes rust is 100% "safe" (by any definition) is wrong. That's not something you learn in Kindergarten though, it's actually about understanding that Rice's Theorem is a generalization of the Halting Problem.
> o this is something that I really need the Rust people to understand. That whole reality of "safe" not being some absolute thing
The irony of Linus lecturing anyone on safety lol anyway "the Rust people" know this already, when they say "safe" they mean "memory safe" - https://en.wikipedia.org/wiki/Memory_safety
Anyway, dumb shit like this is why I've always been quietly dreading Rust in the kernel.
a) The kernel will never be safe software because the mainline developers don't want it to be or even know what safe means
b) It just invites more posts like this and puts Rust closer to one of the most annoying software communities
> Or, you know, if you can't deal with the rules that the kernel requires, then just don't do kernel programming.
Agreed on this point. I was very interested in kernel dev earlier in my career until I actually started to engage with it.
It is sometimes acceptable to get wrong output. But is nearly always better to know it is wrong.
edit:
> Yeah I was just trying to provide a clear definition, I didn't think you were implying it was BS.
(would have replied but I'm rate limited on HN - thanks dang!)
Sometimes you need to stress the difference betwen “my opinion” (as in “Kernel development requires greater safety standards”) and “facts” (“safety does not exist in software because it does not control the hardware aspect”).
You aren't safe on the FOB, in your car, in your barracks, or in your house. There are only degrees of safety. Very wise almost globally applicable words.
I know I was a little surprised when I was learning Swift after hearing it was called safe only to experience crashes with array OOB. Took some explanation and thinking to understand what was meant by safe.
If he were any other person, he’d have been axed a long time ago for this behavior.
I don’t understand how people put up with this kind of toxicity, even from him.
EFI is FAT, FAT is not journaled. You almost certainly have EFI these days.
https://en.wikipedia.org/wiki/Kdump_(Linux)
He also mentions that programs can report problems automatically to the distro devs. For example:
https://retrace.fedoraproject.org/faf/problems/
A kernel dump is not something you always want to upload since it can be large and contain sensitive info. I'm not a kernel dev though.
Of course this working requires the fresh kernel to be able to get up and do that without itself crashing, so it can't capture every scenario. And it is bringing down the system completely, and there's lots of pros and cons to be argued about that vs attempting to continue or limp along.
He gets a lot more leeway than being the creator of Linux should afford someone.
It would be cool if kernel Rust could implement a panic handler which just killed the offending module, but I’m assuming from the discussion around panics that this isn’t possible.
From the closing paragraph, I feel like he’s under the impression that Rust-advocating contributors are putting Rust’s interests (e.g. “legitimizing it” by getting it in the kernel) above the kernel itself.
None of that is going to save us from bad code.
Some of the biggest systems that run the world are not written with either safe code nor strongly typed languages.
Yes I would say strongly typed languages and memory safe languages help make coding easier and indeed save time and some bugs.
But when you get past making the kinds of errors that cause memory problems or bad types…
You are still left with 95% of the bugs and logic errors anyway.
Still, 5% savings in productivity is not nothing.
Yes, we know. We get it. Rust is not an absolute guarantee of safety and doesn’t protect us from all the bugs. This is obvious and well-known to anyone actually using Rust.
At this point, the argument feels like some sort of ideological debate happening outside the realm of actually getting work done. It feels like any time someone says that Rust defends against certain types of safety errors, someone feels obligated to pop out of the background and remind everyone that it doesn’t protect against every code safety issue.
Inclusivity and non-hostile work environments should not be considered “perfect” and “all-inclusive”. They should be basic. The default. The lowest bar possible.
I thought that he had apologised and regretted being hostile in comments. Apparently not. Not that I have much of an issue with ranty colorful language, but you need to also be right and have a legitimate cause to pull it off...
The point he makes is BS. "the reality is that there are no absolute guarantees. Ever" Yeah, DUH! The compiler could have bugs and soundness issues for example.
The point is you don't need "absolute guarantees" just "way safer and which dozens more classes of issues discovered automatically" is already enough. The other guy didn't write about "absolute guarantees". He said "WE'RE TRYING to guarantee the absence of undefined behaviour". That's an aim, not a claim they've either achieved it, or they can achieve it 100%
>Even "safe" rust code in user space will do things like panic when things go wrong (overflows, allocation failures, etc). If you don't realize that that is NOT some kind of true safely, I don't know what to say.
Well, if Linus doesn't realize this is irrelevant to the argument the parent made and the intention he talked about, I don't know what to say...
It will be written to on every kernel update and every initramfs update at least, which is what.. once a week on average?
A reply like yours is not so subtly indicating that "it's fine to panic all the time because ultimately you might be fine if you get a panic", which I fundamentally disagree with, other concerns aside.
Also you're suggesting that journaling filesystems are perfect and never lose data, which is also very untrue, in the default case they only protect metadata but there are still circumstances where they can lose data anyway; they're more resilient, not immune.
You should read the email thread, as Linhas explains in clear terms.
Take for instance Linus's insightful followup post:
Now we got a first glimpse at what happens.
Still, I find it strange that it never seemed to come up in preparation to the first Rust merges. Were there any conflict resolution strategies in place (that I don't know about) or just "we flame it out on LKML"?
Fault tolerant - you get a fault, you keep moving.
Fail safe - you fail, and thus all operations are stopped.
Sure but people use this logic to justify no safety. Find me a marine a that goes into war totally naked.
Other software dictators do exactly the same, but in a more underhanded and bureaucratic manner, which is worse. Yet their disciples call them "benevolent".
I can deal with Linus, but not with the latter. Linus strikes me as not being really serious or vindictive. It's just a colorful way of expressing himself.
Like with any emerging technology, early adopters become advocates because they’re convinced of the technology’s superiority. Once they organize into a community and get to know each other personally, then at least some of the motivation shifts: you want to see your friends succeed, you want to be part of a community that is making change, you want your early adoption to be “validated” by mainstream success, etc.
This can cloud technical judgment (not saying this is happening here, but if it were, it wouldn’t be surprising)
Never used Rust before but is there a way to supply some default code to run in such a situation instead of just not carrying out the bad operation?
Rust provides certain guarantees of memory safety, which is great, but it's important to understand exactly what that means and not to oversell it.
>>>> No (Linus)
>>> As you know, we're trying to guarantee the absence of undefined behaviour for code written in Rust. And the context is _really_ important, so important that leaving it up to comments isn't enough.
…
>>> Do you have an opinion on the above?
>> This message. Ie. No. you can’t make everyone play by your rules. (Linus, grumpily)
> While I disagree with some of what you write, the point is taken.
> But I won't give up on Rust guarantees just yet, I'll try to find ergonomic ways to enforce them at compile time.
I mean, it doesn’t sound like he’s being petty or misunderstanding.
They want special rules (which won’t work) to do runtime checking for rust code. That seems weird, right?
Rust safety should be compile time. That’s the point…
I dunno, maybe I don’t understand what’s being said, but I don’t think Linus is particularly wrong here, even if it’s kind of shouty.
I think it's all part of the language maturing process. Give it time, zealots will either move on to something new (and then harass the rust community for not meeting their new standard of excellence) or simmer down and get to work.
> Not completing the operation at all, is not really any better than getting the wrong answer, it's only more debuggable.
What Linus is saying is 100% right of course - he is trying to set the expectations straight in saying that just because you replaced C code with multi thousands (or whatever huge number) of man months of efforts, corrections and refinements with Rust code it doesn't mean absolute safety is guaranteed. For him as a kernel guy just as when you double free the kernel C code detects it and warns about it Rust will panic abort on overflows/alloc fails etc. To the kernel that is not safety at all - as he points out it is only more debuggable.
He is allowing Rust in the kernel so he understands the fact that Rust allows you to shoot yourself in the foot a lot less than standard C - he is merely pointing out the reality that in kernel space or even user space that does not equate to absolute total safety. And as a chief kernel maintainer he is well within his rights to set the expectation straight that tomorrow's kernel-rust programmers write code with this point in mind.
(IOW as an example he doesn't want to see patches in Rust code that ignore kernel realities for Rust's magical safety guarantee - directly or indirectly allocating large chunks of memory may always fail in the kernel and would need to be accounted for even in Rust code.)
I mean the post Linus initially responded to did contain[1] a patch removing a kernel define, asking if anyone had any objections over removing that define, just to make the resulting Rust code a little nicer looking.
A lot of modern userspace code, including Rust code in the standard library, thinks that invariant failures (AKA "programmer errors") should cause some sort of assertion failure or crash (Rust or Go `panic`, C/C++ `assert`, etc). In the kernel, claims Linus, failing loudly is worse than trying to keep going because failing would also kill the failure reporting mechanisms.
He advocates for a sort of soft-failure, where the code tells you you're entering unknown territory and then goes ahead and does whatever. Maybe it crashes later, maybe it returns the wrong answer, who knows, the only thing it won't do is halt the kernel at the point the error was detected.
Think of the following Rust API for an array, which needs to be able to handle the case of a user reading an index outside its bounds:
struct Array<T> { ... }
impl<T> Array<T> {
fn len(&self) -> usize;
// if idx >= len, panic
fn get_or_panic(&self, idx: usize) -> T;
// if idx >= len, return None
fn get_or_none(&self, idx: usize) -> Option<T>;
// if idx >= len, print a stack trace and return
// who knows what
unsafe fn get_or_undefined(&self, idx: usize) -> T;
}
The first two are safe by the Rust definition, because they can't cause memory-unsafe behavior. The second two are safe by the Linus/Linux definition, because they won't cause a kernel panic. If you have to choose between #1 and #3, Linus is putting his foot down and saying that the kernel's answer is #3.If you are an asshole, are known to be an asshole, have no intention of changing that, and are working with others… maybe don’t. You’re free to work alone, but why make people around you miserable by having to deal with you? Go be an asshole to yourself and let everyone else work together.
It’s shocking that advocating for safe and inclusive work environments is such a controversial topic. If he were any other person, his behavior would be quashed in a second.
That's not exactly the vibe I'm getting from the typical Rust fanboys popping up whenever there's another CVE caused by the usage of C or C++ though ;)
Rust does seem to attract the same sort of insufferable personalities that have been so typical for C++ in the past. Why that is, I have no idea.
If his initial interpretation and expectation of the Rustacean response is in fact correct, the line of argumentation does not seem per se wrong, but I do think that it is bad practice in adversarial conversations to do the thing where you silently skip forward several steps in the argument and respond to what you expect (their response to your response)^n to be instead of the immediate argument at hand.
Which distros actually use the EFI System Partition that way? I've usually only seen the ESP used to hold the bootloader itself, with kernels and initramfs and the bootloader config pointing to them stored either in a separate /boot partition or in a /boot directory of the / filesystem.
If these people are insufferable to you, that I can't change your mind on. That said you might want to get used to it since major areas of industry are already considering C/C++ as deprecated (a paraphrasing from the Azure CTO recently)
Indeed, there's a lot of damage control going on in this thread walking back Rust's guarantees of safety despite that, up until this point, being Rust's only real selling point. It seems like every C/C++/Go/whatever repository has at least one issue suggesting a complete rewrite in Rust.
People say "it's raining" without having to add "except under roofs".
Its a bit sad that Linus needs to replicate individually what other engineering disciplines are mandated to by regulations. Look at car, train or aviation safety, they are decades ahead.
Bordering the hypocritical... And I got the impression you missed his point as well.
It shouldn’t be the responsibility of every adopter to dig deep enough to find it’s actually not true.
Do you have any actual counter points or were you planning on beating that ad hominem to death?
The insufferable nature of the people isn't the advocating of safety. It's that Rust seems to have evolved a community of "X wouldn't have happened if Y was written in Rust!" and then walking away like they just transferred the one bit of knowledge everyone needed. They occupy less than 1% of the programming community and act like they single-handedly are the only people who understand correctness. It's this smug sense of superiority that is completely undeserved that makes the community insufferable. Not the safety "guarantees" of the language.
The language determines the definition of its constructs, not the software being written with it.
Edit: It's worth mentioning that while I think he is wrong, I think it's symptomatic of there not being a keyword/designation in Rust to express what Linus is trying to say. I would completely oppose misusing the unsafe keyword since it has negative downstream effects on all future dependency crates, where it's not clear what characteristics "unsafe" refers to which causes a split. So maybe they need to just discuss a different way to label these for now and agree to improve it later.
If we're going to be serious about who is being toxic, it's definitely Linus in this thread. Guy makes first mistake (by a very broad interpretation of "mistake". Perhaps "misunderstanding"?). Linus goes nuclear. And while his reasoning is sound, his argumentation cycles between threats, bad-faith arguments, and just plain old yelling.
What some people don't understand is that the Linux kernel isn't 'led' in any meaningful sense. But I suppose some projects don't need actual leadership? I once was recommended a Metallica documentary, because "It's amusing to see what emotionally stunted 40-50 year olds who have never had anyone tell them 'No' since 18 will do." That's the Linus vibe -- somehow we've limped along to here. Seriously, read the rust/rust-lang issues/RFCs. Those people sound like grownups contrasted to this.
Because "safe" in the context of a programming language is provably wrong and thus will trigger adversary reactions.
Rust is a hardened language, compared to C/C++. In the same way that Ada is hardened language, with different techniques, but the spirit is similar.
It’s not a “work environment”. You can’t report Linus to HR. If you have a problem with him, you can fork the kernel and convince others to follow you. Then you’ll have a mailing list where you can ban Linus for his style. Good luck!
In today's news "random angry guy on the Internet tells Linus Torvalds to go back to kindergarten, because reasons"
First of all, making a problem both obvious and easier to solve is better. Nothing "only" about it - it's better. Better both for the programmers and for the users. For the programmer the benefit is obvious, for the user problems will simply be more rare, because the benefit the programmer received will make software better faster.
Second, about the behavior. When you attempt to save changes to your document, would you rather have the corruption of your document due to a bug fail with fanfare or succeed silently? How about the web page you visited with embedded malicious JavaScript from a compromised third party, would you rather the web page closed or have your bank details for sale on a foreign forum? When correctness is out the window, you must abort.
But Rust's situation is still safer, because Rust can typically prevent more errors from ever becoming a run-time issue, e.g. you may not even need to use array indexing at all if you use iterators. You have a guarantee that references are never NULL, so you don't risk nullptr crash, etc.
Rust panics are safer, because they reliably happen instead of an actually unsafe operation. Mitigations in C are usually best-effort and you may be lucky/unlucky to silently corrupt memory.
Panics are a problem for uptime, but not for safety (in the sense they're not exploitable for more than DoS).
In the long term crashing loud and clear may be better for reliability. You shake out all the bugs instead of having latent issues that corrupt your data.
You calling his point BS, but also strongly agreeing with it.
I guess you find it too obvious. But while it's obvious to many, there seem to be many who do not understand it. Issues involving rust often get derailed to pointlessness when rust's safety guarantees are treated as an absolute.
If you dig slightly below the surface in any major userspace codebase, it has abort paths everywhere. Every memory allocation might abort, every array index or dict lookup might throw an exception, which if uncaught will abort. Lock (or unlock) a mutex twice, abort.
The Rust standard library inherited this philosophy in large and small ways. An easy example (already being addressed) is memory allocation, but less obvious is stuff like "integer math is allowed to panic on overflow". It's not easy to write Rust code that is guaranteed not to panic in any branch.
Now the userspace-trained Rust folks are working in the kernel, and they want to be able to panic() when something goes horribly wrong, but that's not how the kernel code works. They'd have the same issue if you tried to get a bunch of GNOME contributors to write kernel drivers with GLib, even though GLib is pure C.
My intuition says that's the Halting Problem, so not actually possible to implement perfectly? https://en.wikipedia.org/wiki/Halting_problem
Linus sounds so ignorant in this comment. As if no one else thought of writing safety-critical systems in a language that had dynamic errors, and that dynamic errors are going to bring the whole system down or turn it into a brick. No way!
Errors don't have to be full-blown exceptions with all that rigamarole, but silently continuing with corruption is utter madness and in 2022 Linus should feel embarrassed for advocating such a backwards view.
[1] This works for Erlang. Not everything needs to be a shared-nothing actor, to be sure, but failing a whole task is about the right granularity to allow reasoning about the system. E.g. a few dozen to a few hundred types of tasks or processes seems about right.
So I can understand where Linus comes from.
Note that Rust is easier to work with than C here, because although the C-like API isn't shy about panicking where C would segfault, it also inherits enough of the OCaml/Haskell/ML idiom to have non-panic APIs for pretty much any major operation. Calling `saturating_add()` instead of `+` is verbose, but it's feasible in a way that C just isn't unless you go full MISRA.
This morning I was reading about the analysis of an incident in which a London tube train drove away with open doors. Nobody was harmed, or even in immediate danger, the train had relatively few passengers and in fact they only finally alerted the driver at the next station, classic British politeness (they made videos, took photographs, but they didn't use the emergency call button until the train got to a station)
Anyway, the underlying cause involves systems which were flooded with critical "I'm failing" messages and would just periodically reboot and then press on. The train had been critically faulty for minutes, maybe even days before the incident, but rather than fail, and go out of service, systems kept trying to press on. The safety systems wouldn't have allowed this failed train to drive with its doors open - but the safety critical mistake to disable safety systems and drive the train anyway wouldn't have happened if the initial failure had caused the train to immediately go out of passenger service instead of limping on for who knows how long.
The fact that arbitrary programs are undecidable is a red herring here.
If you want to run an Erlang-style distributed system in the kernel then that's an interesting research project, but it isn't where Linux is today. You'd be better off starting with SeL4 or Fuchsia.
- Remember when Linux had that caused the kernel to partially crash and eat 100% CPU due to some bug in the leap second application code? That caused a >1MW spike in power usage at Hetzner at the time. That must have been >1GW globally. Many people didn’t notice it immediately, so it must have taken weeks before everyone rebooted.
- I’ve personally run into issues where not crashing caused Linux to go on and eat my file system.
On any Linux server I maintain, I always toggle those sysctls that cause the kernel to panic on oops, and reboot on panic.
What makes you say this? From the sample I've seen, Rust programs are far more diligent about handling errors (not panicking: either returning error or handling it explicitly) than C or Go programs due to the nature of wrapped types like Option<T> and Result<T, E>. You can't escape handling the error, and panicking potential is very easy to see and lint against with clippy in the code.
As I posted up in this thread, the right way to handle this is to make dynamic errors either throw exceptions or kill the whole task, and split the critical work into tasks that can be as-a-whole failed or completed, almost like transactions. The idea that the kernel should just go on limping in a f'd up state is bonkers.
Depends. Is a kernel panic better than something acting wrongly? I prefer my kernel not to panic, at the expense of some error somewhere that may or may not crash my system.
If you look at the output of `dmesg` on any Linux system you often will see errors even in a perfectly working system. This is because programs of that size are by definition not perfect, there are bugs, the hardware itself has bugs, thus you want the system to keep running even if something is not working 100% right. Most of the time you will not even notice it.
> First of all, making a problem both obvious and easier to solve is better.
It's the same with assertions: useful for debugging, but we all disable them in production, when the program is not in the hands of a developer but of the customer, since for a customer a system that crashes completely is worse than a system that has some bugs somewhere.
Not so sure about this. I see a good amount of acrimony toward C, C++, Go, Zig, etc. from the Rust side.
Just because it’s not an official “work environment” per your definition does not mean it isn’t hostile or intolerable were it actually one.
But actually countering that point is a lot harder, isn’t it?
Unreasonable people also build things alone where everyone else doesn’t have to deal with them.
The world is hardly as black and white as you make it seem.
That's different than solving the halting problem. You're not trying to prove it halts, you're just trying to prove it doesn't halt in a specific way, which is trivial to prove if you first make it impossible.
if false {
panic!()
}
Basically you'd prohibit any call to panic whether they may actually end up running or not.Maybe my misunderstanding comes from my ignorance of the kernel's architecture, but surely there's a way to segregate operations in logical fallible tasks, so that a failure inside of a task aborts the task but doesn't put down the entire thing, and in particular not a sensitive part like kernel error reporting? Or are we talking about panics in this sensitive part?
Bubbling up errors in fallible tasks can be implemented using panic by unwrapping up to the fallible task's boundary.
To my understanding this is exactly what any modern OS does with user space processes?
I always have the hardest of time in discussions with people advocating for or against that "you should stop computations on an incorrect result". Which computations should you stop? Surely, we're not advocating for bursting the entire computer into flames. There has to be a boundary. So, my take is to start defining the boundaries, and yes, to stop computations up to these boundaries.
I feel like we must have read two different articles. You sound crazy. Didn't read it your way at all.
> Think of that "debugging tools give a huge warning" as being the equivalent of std::panic in standard rust. Yes, the kernel will continue (unless you have panic-on-warn set), because the kernel MUST continue in order for that "report to upstream" to have a chance of happening.
"If the kernel shuts down the world, we don't get the bug report", seems like a pretty good argument. There are two options when you hit a panic in rust code:
* Panic and shut it all down. This prevents any reporting mechanism like a core dump. You cannot attach a normal debugger to the kernel.
* Ignore the panic and proceed with the information it failed, reporting this failure later.
The kernel is a single program, so it's not like you could just fork it before every Rust call and fail if they fail.
But determining that a function (such as panic) is never called because there are no calls to it is pretty easy.
> In the kernel, "panic and stop" is not an option (it's actively worse than even the wrong answer, since it's really not debugable), so the kernel version of "panic" is "WARN_ON_ONCE()" and continue with the wrong answer.
(edit, and):
> Yes, the kernel will continue (unless you have panic-on-warn set), because the kernel MUST continue in order for that "report to upstream" to have a chance of happening.
Did I read that right? The kernel must continue? Yes, sure, absolutely...but maybe it doesn't need to continue with the next instruction, but maybe in an error handler? Is his thinking so narrow? I hope not.
As someone who worked on a lot of OCaml projects, I would like to assure you that the issue really is the Haskell community which I too find completely unbearable. The rest of the FP community is far nicer/less smug.
For a long time, they just thought it was a shame some innovative constructs seemed to be stuck in their favourite languages (first class functions, variant types, inference) and not percolating to the mainstream. This fight has mostly be won which is great.
Note that we can only check for maybe, because in general we don't know if some code in the middle will somehow execute forever and never reach the panic call after it.
How many people are joining? How many people are joining because or in lieu of Linus? How many people are joining just because it’s Linux/Git/whatever (although granted that is in part due to Linus making them such big things)? How many people would have joined/wouldn’t have left if he wasn’t there?
Depends on what the operation is. If the operation is flying an airplane or controlling a nuclear reaction, you are sure that not completing the operation and just aborting the program is the worst outcome possible. Beside the error can crash the plane or melt down the nuclear reactor, but may also not have any effect at all, e.g. a buffer overflow overwrites a memory area that is not used for anything important.
Of course these are extreme example (for which Linux is of course out of discussion since it doesn't offer the level of safety guaranteed required), but we can make other examples.
One example could be your own PC. If you use Linux, take a look at the dmesg output and count the number of errors: there are probably a lot of them, for multiple reason. You surely want your system to continue running, and not panic on each of them!
I'm not understanding that if Linuxrusters want to do more their own thing and get rid of those rules and discussions they just fork off a real Linuxrustkernel and go off?
The "political correct" toxicity comes from that group which continously wants to undermine long beforehand agreeds frontiers... (e.g. again this panic-is-more-safe discussion).
In my opinion, in the software world, there is a large number of people who are very convinced of their own correctness. When they do something wrong or are simply mistaken, a gentle correction doesn't work. Linus is probably used to dealing with these people. I'm not saying the person he was replying to was necessarily doing that, but after a while you have an automatic response.
The beauty and horror of OSS is that anyone can contribute. Having someone scream "WTF are you doing???" every once in a while isn't a bad thing. It's not nice to hear that being directed at you, but sometimes in life it is necessary.
Reading a book on Rust programming is an entirely different matter since authors tend to elaborate upon what they are claiming. The reader has to understand how things work and what the limits are. As such, there is less opportunity for misinformation to spread and less room for conflict.
TFA is about making it possible for the kernel to decide what to do, rather than exploding on the spot, which is terrible.
> * Panic and shut it all down. This prevents any reporting mechanism like a core dump. You cannot attach a normal debugger to the kernel.
No one is really advocating that. Clearly you need to be able to write code that fails at a smaller granularity than the whole kernel. See my comment upthread about what I mean by that: dynamic errors fail smaller granularity tasks and handlers deal with tasks failing due to safety checks going bad.
Where do you unwind to if memory is corrupted?
I don't think we're talking about what would be exception handling in other languages. I believe it's asserts. How do userland processes handle a failed assertion? Usually the process is terminated, but giving a debugger the possibility to examine the state first, or dumping core.
And that's similar to what they are doing in the kernel. Only in that in the kernel, it's more dangerous because there is limited process / task isolation. I think that is an argument that taking down "full-blown separate processes" might not even be enough in the kernel.
Although, often in embedded programming, a watchdog that resets the board can be the right thing to do. (As long as you don't get a boot loop.)
Regardless, I think just killing the task instantly, even with partial updates to memory, would be totally fine. It'd be cheap, as automatically undoing the updates (effectively a transaction rollback) is still too expensive. Software transactional memory just comes with too much overhead.
I vote "kill and unwind" and then dealing with partial updates has to be left to a higher level.
I will agree with you that I dread Rust in the kernel, hopefully it can continue to exist there peacefully without people getting too hot under the collar about their personal hang-ups. For all its flaws Rust has an amazing value prop in the borrow checker and I would love for memory bugs to be eliminated for good.
The safety is always with an asterisk. Rust provides memory safety — provided that unsafe blocks, FFI, and other code running in the same process, and the OS itself, and the hardware doesn't misbehave.
But if you accept that Python and Java can be called safe languages then Rust can be too. The other ones also have unsafe escape hatches and depend on their underlying implementations to be correct to uphold safety for their safe side.
But to restate briefly, the answer varies wildly between kernel and user programs, because a user program failing hard on corrupt state is still able to report that failure/bug, whereas a kernel panic is a difficult to report problem (and breaks a bunch of automated reporting tooling).
So in answer: Read the discussion.
Working out whether it will write 1 to the tape in general is undecidable, but in certain cases (you've just banned states that write 1) it's trivial.
If all of the state transitions are valid (a transition to a non-existing state is a halt) then the machine can't get into a state that will transition into a halt, so it can't halt. That's a small fraction of all the machines that won't halt, but it's easy to tell when you have one of this kind by looking at the state machine.
Linux isn’t a microkernel. If you want to work on a microkernel, go work on Fuchsia. It’s interesting research but utterly irrelevant to the point at hand.
Anyway, the microkernel discussion has been happening for three decades now. They haven’t historically had a little lower performance. They had garbage performance, to the point of being unsuitable in the 90s.
Plenty of kernel code can’t be written as to be unwindable. That’s the issue at hand. In a fantasy world, it might have been written as such but it’s not the world will live in which is what matters to Linus.
Although I think this can be better done by some special panic handler that performs a sjlj and notify other systems about the failure, without continuing running with the wrong output...
This entirely depends on the industry and the customer. My team leaves asserts on in production code because our customers want aborts over silent misbehavior.
It is an order of magnitude cheaper for them if things fail loudly and they get a fix when compared to them tracking down quiet issues hours, days, or even months after the fact.
Personally I find that Linus here not toxic at all, at most a borderline strong opinion, but come on, as well as we all need to be more empathic, we should also be able to take some harsher critique and make not such a toxicity thing out of an more open and direct opinionated response...
The discussion is a little more nuanced than just that. It is "we've entered an invalid/undefined/corrupt state, now what?" And in essence saying "We ONLY panic as a matter of last resort, we'll just spit out a bunch of loggable errors and soft fail from the kernel call until then."
The threat is pretty clear? "If Rust people don't get this, we will have to part ways." This is an ultimatum? It's crazy girlfriend/boyfriend material? It's ridiculous after one contributor tries something that Linus thinks won't work in the kernel. Ridiculous. Just say no.
The slander as well? "Rust’s community, in aggregate, have developed a reputation." And you know what? The C/C++/Zig/Nim/Haskell/Clojure communities have developed a reputation too, but, gosh, I don't talk about it because I know labeling groups isn't helpful/is completely non-technical.
As you said, you have the option to reboot on panic, but Linus is absolutely not wrong that this size does not fit all.
What about a medical procedure that WILL kill the patient if interrupted? What about life support in space? Hitting an assert in those kinds of systems is a very bad place to be, but an automatic halt is worse than at least giving the people involved a CHANCE to try and get to a state where it's safe to take the system offline and restart it.
This is how Rust has always defined it. Linus is specifically saying that "Rust people" don't understand what "safe" is but... they do, he doesn't. He could say "Rust defines it as X, the kernel needs Y" but he doesn't say that, he implies that Rust people just don't understand the word "safe" or that they think Rust is safer than it is, which is simply not true. As I said, quite ironic given history.
> I wouldn't blame their discussion partners for not knowing what the developer means when they talk about some code being "safe".
I mean, I would definitely blame them if they're also going to go on an insulting rant about their definition being wrong.
> without people getting too hot under the collar about their personal hang-ups
Impossible, in my opinion, until a ton of people retire.
https://lkml.org/lkml/2022/9/19/640
Get at least down to here:
https://lkml.org/lkml/2022/9/20/1342
What Linus seems to be getting at is that there are many varying contextual restrictions on what code can do in different parts of the kernel, that Filho etc appear to be attempting to hide that complexity using language features, and that in his opinion it is not workable to fit kernel APIs into Rust's common definition of a single kind of "safe" code. All of this makes sense, in user land you don't normally have to care about things like whether different functional units are turned on or off, how interrupts are set up, etc, but in kernel you do. I'm not sure if Rust's macros and type system will allow solving the problem as Linus frames it but it seems like a worthy target & will be interesting to watch.
And I don’t think he’s making a system level claim, that the whole train system should be designed to limp on through failures. He’s claiming that the kernel needs to be able to limp on so that the systems that use it can have the best chance of e.g. sending automated bug reports. (Or you can turn off the limping behavior if you want; maybe trains should do that. But maybe a train’s control system randomly rebooting might be more catastrophic than leaving its doors open? I don’t know.)
From a couple messages up-thread in the OP:
> … having behavior changes depending on context is a total disaster. And that's invariably why people want this disgusting thing.
They want to do broken things like "I want to allocate memory, and I don't want to care where I am, so I want the memory allocator to just do the whole GFP_ATOMIC for me".
And that is FUNDAMENTALLY BROKEN.
If you want to allocate memory, and you don't want to care about what context you are in, or whether you are holding spinlocks etc, then you damn well shouldn't be doing kernel programming. Not in C, and not in Rust.
It really is that simple. Contexts like this ("I am in a critical region, I must not do memory allocation or use sleeping locks") is fundamental to kernel programming. It has nothing to do with the language, and everything to do with the problem space.
So don't go down this "let's have the allocator just know if you're in an atomic context automatically" path. It's wrong. It's complete garbage. It may generate kernel code that superficially "works", but one that is fundamentally broken, and will fail and becaome unreliable under memory pressure
If the kernel acquiesces to certain philosophies that are opposite to its intent as-a-kernel for many other environments and contexts it must support, a cascade of later patches could derail things completely. It may become too much effort to undo, and the project must limp along--until that mountain of tech debt costs too much to fix.
Maybe the kernel cannot fail fast for good reasons. And the Linux project cannot fail fast for equally good reasons.
And possibly, if a technically compelling reason presents itself, Linus may fully back it--even contributing to that work himself.
Keep in mind that in a kernel panic no hardware is assumed to work, so assumptions like "just write to storage!" isn't an assumption you can make, you're in a panic the IO could have been literally pulled out.
It is not, because that is exactly the problem.. whats your view of safe and inclusive is to some hostile and exclusive.. and until people realize that this extreme creates similar well-behaved assholes: nevermind.
> dynamic errors fail smaller granularity tasks and handlers deal with tasks failing due to safety checks going bad.
Yes and that's why Rust is bad here (but it doesn't have to be). Rust _forces_ you to stop the whole world when an error occurs. You cannot fail at a smaller granularity. You have to panic. Period. This is why it is being criticized here. It doesn't allow you any other granularity. The top comment has some alternatives that still work in Rust.
Unless I’m mistaken, in “safe” Rust, programs can still crash but only by calling “panic”, or other trivial cases (explicitly calling “exit” with a nonzero return value, calling into ffi code, etc)
Detecting functions which may “panic” and “exit” is very easy, significantly easier than detecting possible UB. Avoiding these functions (or providing a comment “no-panic guarantee” like “safety guarantee” for unsafe Rust) doesn’t seem very hard, since lots of panicking functions have a non-panicking variant.
People don't have a universal right to collaborate to this project, especially on their own terms.
In the same way, these projects will like you said, evolve in positive or negative ways with no God given right to exist and thrive.
As the sibling comment pointed out, if you extend this idea to clean up all state, you end up with processes.
I do have some doubt on the no panic rule. But instead of emulating processes in the kernel, I’d see a firmware like subsystem whose only job it is to export core dumps from the local system, after which the kernel is free to panic.
As a general point and in my view, and I agree this is an appeal to authority, Linus has this uncanny ability to find compromises between practicality and theory that result in successful real world software. He’s not always right but he’s almost never completely wrong.
I'm only agreeing with the fact that there are no absolute guarantees. Not that his use of the fact in the point he makes has any relevance...
If somebody had said "The earth is round, therefore we should not care about getting lost and GPS, because you always can always keep going and end up where you started on a sphere anyway", then I would have also "stongly agreed" to the first factoid, but think the overall point BS.
Not quite, because stack overflows can cause panics independent of any actual invocation of the panic macro.
You need to either change how stack overflows are handled as well, or you need to do some static analysis of the stack size as well.
Both are possible (while keeping rust turing complete), so it's still not like the halting problem.
Rust needs to fix that then. So we agree on that.
Reliably saving state in the face of sudden total failure is both very tricky and app-specific. Just saving state changes automatically won't do it -- partial writes of complex state are likely to be inconsistent without luck or careful design and QA controls (tests, testing, on-going controls to ensure nothing new operates or relies on anything outside the safe state-saving mechanism).
It makes a lot more sense to put the effort into making the OS continue as well as it can, vs requiring every app to harden itself against sudden total failures.
That's a pretty intolerable outright hostile and exclusive judgement :(
Maybe I’m too young (just past 30) but is it just me or is that some kind of attitude that emerged in the last 10-15 years?
And I mean not only in programming, but in general.
A small amount of people which is very vocal about something and start pushing everybody else to their thing while simultaneously shaming and/or making fun of those who either disagree or aren’t generally interested.
I kinda see a pattern here.
Either way, it’s very annoying.
Going back to the rust topic… I recently started working with some software written in a mix of C++ and Java. I don’t own the codebase, I “just” have to get it working and keep it working. So i had to reach to another person for some performance issues and this guy starts the usual “should be rewritten in rust” … jesus christ dude, I don’t care for your fanboyism right know, either help me or tell me you won’t so I’ll looks somewhere else.
And of course, if as an outsider this is the experience I have to go through every time I deal with rust people… I’ll try to minimise my exposure to such people (and to the language, if necessary).
Only if one can't separate a trivial factoid being used in an argument with the quality of the argument itself and the point being made...
You, know, you can agree that "there are no absolute guarantees" while still considering it a BS argument to use this fact to support that having the (non-absolute) guarantees Rust does give is in any way less useful...
You can also disagree that "there are no absolute guarantees", while true, has any place to be used in an argument against the use of safer compilers...
That's of: "There are no absolute guarantees against dying from a crash, and safety belts don't give you any, so let's not use safety belts either" quality
A panic caused by the formatting in a rarely used log output taking down all of a large company's NTP servers simultaneously, for example, would not be seen as a reasonable tradeoff.
Too bad there was no around to do that to Linus; maybe he'd finally realize that being an asshole is generally not a correct response.
It's not like there's not exceptions in Rust though. The error handling is thorough to a fault when it's used. Unwrap is just a shortcut to say "I know there might be bad input, I don't want to handle it right now, just let me do it and I'll accept the panic."
Rust’s “safety” has always meant what the Rust team meant by that term. There’s no gotcha to be found here except if you can find some way that Rust violates its own definition of the S-word.
This submission is not really about safety. It’s a perfectly legitimate concern that Rust likes to panic and that panicking is inappropriate for Linux. That isn’t about safety per se.
“Safety“ is a very technical term in the PL context and you just end up endlessly bickering if you try to torture the term into certain applications. Is it safer to crash immediately or to continue the program in a corrupted state? That entirely depends on the application and the domain, so it isn’t a useful distinction to make in this context.
EDIT: The best argument one could make from this continue-can-be-safer perspective is that given two PLs, the one that lets you abstract over this decision (to panic or to continue in a corrupted state, preferably with some out of band error reporting) is safer. And maybe C is safer than Rust in that regard (I wouldn’t know).
Kinda a strawman there. That's got to account for, what, 0.0001% of all use of computers, and probably they would never ever use Linux for these applications (I know medical devices DO NOT use Linux).
I keep seeing claims that Rust users are insufferable and claim that Rust protects against everything. But, as someone who has started using Rust around 0.4, I have never seen these insufferable users.
I imagine that they lurk on some communities?
I will go further: if you think what Linus said here is unreasonable or rude, you really need to get out more.
Yes, a kernel panic will cause disruption when it happens. But that will also give a precise error location, which makes reporting and fixing of the root cause easier. It could be harder to pinpoint of the code rolled forward in broken state.
It will cause loss of unsaved data when it happens, but OTOH it will prevent corrupted data from being persisted.
It's up to you to choose the right failure strategy and monitor your system if you don't want to panic, and take appropriate measures and not just ignore the warning.
It's not Linus who sounds ignorant here, it's the people applying user-space "best practices" to the kernel. If the kernel panics, the system is dead and you've lost the opportunity to diagnose the problem, which may be non-deterministic and hard to trigger on purpose.
By his own standards he’s been very polite and calm. Remarkably so I’d say.
He used to be way ruder in the past, then decided to work on that and be kinder.
You can clearly see that in those emails.
The fact that he doesn’t agree with somebody and articulates why doesn’t mean he’s rude.
And then there would be no Linux kernel. So much for companies.
I think history will show that we can do a lot better than C/C++ and Rust is one of the best steps yet to show that. Rust will be replaced by something better some day and the cycle will repeat.
The issue being discussed here is that Rust comes from a perspective of being able to classify errors and being able to automate error handling. In the kernel, it doesn't work like that, as we're working with more constraints than in userland. That includes hardware that doesn't behave like it was expected to.
> including e.g. the monitor attached to the PC used for displaying
> X-ray images
Somewhat off-topic, but I used to work in a dental office. The monitors used for displaying X-rays were just normal monitors, bought from Amazon or Newegg or whatever big-box store had a clearance sale. Even the X-ray sensors themselves were (IIRC) not regulated devices, you could buy one right now on AliExpress if you wanted to.Of course I've had many negative comments from "Rustaceans", with their defence of their negativity being "we don't like it when someone comes into our community".
It is a shame because Rust is a pretty cool language, but at this current rate I don't really see it being "the" systems programming language de jure.
I think Zig is probably a much better fit for writing a Kernel in a safer language. Again, rust programmers pile on and tell me that "zig isn't memory safe". We can't make use of other languages that bring safety benefits without the dog pile of "you should use Rust it's safe". Apparently nothing is safe other than Rust.
It's not a condemnation of rust, but rather a guidepost that, if followed, will actually benefit rust developers.
But wouldn't reading outside an array bounds also possibly do that? It coudl seg fault which is essentially the same thing.
Is it that reading out of bounds on an array isn't guaranteed to crash everything while a panic always will?
In rust, safe code is code that does not have the unsafe keyword.
If all the unsafe code is sound, then you (provably) get high level guarantees about memory safety, etc.
The rust people are complaining that some of the unsafe RCU is unsound. They have a valid point. According to the rust manual, when you make unsound libraries sound, common courtesy dictates you create a CVE for the old implementation.
This is all in the rust book; it's pretty close to "hello world".
Anyway, the rust crowd is definitely right here. It would be better if the rust RCU bindings were sound.
Or in an actual vehicle, the "emergency stop" (if that means just stomping on the brakes) can flip the car and kill its passengers.
Prediction: in time the same will happen to "rust in the kernel" as happened to "c++ in the kernel": Linus will forbid it not because of some intrinsic problem with the language, but because the culture of the community prevented them from understanding the kernel rules.
Which is safe. It's inconvenient, but it's safe. Failures of this sort do happen, electrical fires are probably the most extreme example. They're annoying, but nobody is at risk if you stop. Since the tube is in civilisation (even at the extreme ends of the London Underground which are outside London, like Chesham, this is hardly wilderness, you can probably see a house from where your train stopped if there aren't trees in the way) we can just walk away.
https://commons.wikimedia.org/wiki/File:Chesham_Tube_Station...
> Linus was saying no, you carry on despite the error until you get to the next station
Depending on the error the consequences of attempting to "carry on" may be fatal and it's appropriate that the decision to attempt this rests with a human, and isn't just the normal function of a machine determined to get there regardless.
fn foo<T>() -> Option<T> {
// Oops, something went wrong and we don't have a T.
None
}
fn bar<T>() -> T {
if let Some(t) = foo() {
t
} else {
// This could've been an `unwrap`; just being explicit here
panic!("oh no!");
}
}
A panic in this case is exactly like an exception in that the function that's failing doesn't need to come up with a return value. Unwinding happens instead of returning anything. But if I was writing `bar` and I was trying to follow a policy like "never unwind, always return something", I'd be in a real pickle, because the way the underlying `foo` function is designed, there aren't any T's sitting around for me to return. Should I conjure one out of thin air / uninitialized memory? What does the kernel do in situations like this? I guess the ideal solution is making `bar` return `Option<T>` instead of `T`, but I don't imagine that's always possible?The ideas of Rust weren’t new when Rust was developed. The actual integration into a new programming language beyond experimental status was, and the combination with ML-style functional programming.
For example, say there's a bug in the Linux kernel that would produce a "panic" at midnight Dec 31st 2022... do we accept a billion devices shutting down? In the best case rebooting and resuming a whatever user space program was running?
Despite the bad taste, I think the obvious answer is as Linus says: the Kernel should keep going despite errors.
I think you must have missed out on how Linux currently handles these situations. It does not silently move on past the error; it prints a stack trace and CPU state to the kernel log before moving on. So you have all of the information you'd get from a full kernel panic, plus the benefit of a system that may be able to keep running long enough to save that kernel log to disk.
If you look at how POSIX does it, pretty much every single function has error codes, signaling everything from lost connections, to running out of memory, entropy or whatnot. Failures are hard to abstract away. Unless you have some real viable fallback to use, you're going to have to tell the user that something went wrong and leave it up to them to decide what the application can best do in this case.
So in your case, I would return Result<T>, and encode the errors in that. Simply expose the problem to the caller.
What's funny about this is that (while it's true!) it's exactly the argument that Rustaceans tend to reject out of hand when the subject is hardening C code with analysis tools (or instrumentation gadgets like ASAN/MSAN/fuzzing, which get a lot of the same bile).
In fact when used well, my feeling is that extra-language tooling has largely eliminated the practical safety/correctness advantages of a statically-checked language like rust, or frankly even managed runtimes like .NET or Go. C code today lives in a very different world than it did even a decade ago.
1. Have a constraint on T that lets you return some sort of placeholder. For example, if you've got an array of u8, maybe every read past the end of the array returns 0.
fn bar<T: Default>() -> T {
if let Some(t) = foo() {
t
} else {
eprintln!("programmer error, foo() returned None!");
Default::default()
}
}
2. Return a `Option<T>` from bar, as you describe.3. Return a `Result<T, BarError>`, where `BarError` is a struct or enum describing possible error conditions.
#[non_exhaustive]
enum BarError {
FooIsNone,
}
fn bar<T>() -> Result<T, BarError> {
if let Some(t) = foo() {
Ok(t)
} else {
eprintln!("programmer error, foo() returned None!");
Err(BarError::FooIsNone)
}
}So if some enthusiasts are trying to use Rust at cross purposes for Linux they are likely to appear obnoxious and entitled, and it is perfectly right to challenge them to prove that they can make Rust suitable.
There's more high quality and polite preaching earlier in the thread, for example:
> Please just accept this, and really *internalize* it. Because this isn't actually just about allocators. Allocators may be one very common special case of this kind of issue, and they come up quite often as a result, but this whole "your code needs to *understand* the special restrictions that the kernel is under" is something that is quite fundamental in general.Like any language that has very cool features, there are people that take that tool as not a tool but a religion.
You can even look in my comment history and see people arguing with me when I say I was a rust fan, but memory safety isn't a requirement in some areas of programming. One person made it there mission to convince me that can't possibly be the case and in (in my example of video games) that any memory bug crashes and game and will make users quit and leave.
There is an enormous variation in output targets for a panic on Linux: graphics hardware attached to PCIe (requires graphics driver and possibly support from PCIe bus master, I don't know), serial interface (USART driver), serial via USB (serial over USB driver, USB protocol stack, USB root hub driver, whatever bus that is attached to)... There is a very real chance that the error reporting ends up encountering the same issue (e.g. some inconsistent data on the kernel heap) while reporting it, Which would leave the developers with no information to work from if the kernel traps itself in an endless error handling loop.
It's called manufacturing consent and it's all around us.
How is a "guarantee" not claiming something is 100% ?
Array bounds checks are one of the most important safety measures Rust takes, and those have to happen at runtime (if the optimizer can't prove they'll never fire). Similarly, locking types like `Mutex` of course do all their locking and unlocking at runtime, though they also use the type system to express the fact that they will do that.
I feel like it's a defensive reaction, that people feel like Rust is seeking to obviate arcane skills they've built over the course of their careers. Which I don't think is true, I think there will always be a need for such skills and that Rust has no mission to disrupt other language communities, but I can understand the reaction.
> You can even look in my comment history
Is this the thread you're referring to?
https://news.ycombinator.com/item?id=32878868
Because I genuinely don't see what you're talking about. No one seems to "make it their mission" and no one seems to be arguing for Rust in particular, as much as this category of languages.
Engineers should want the immediate stop, because that's safer, especially in safety critical situations.
I have no idea what this actually refers to? "Panic is more safe"? Rust doesn't choose to panic on a failed memory allocation in the kernel, and never intended to. It was always TODO until it was implemented? Linus is using a userspace panic as an example here?
As to this thread's particular issue, the API for an allocation wasn't settled, and this is the discussion. I think the contributor was completely within his remit to say, "Heck, we could do this is a more memory safe way..." And Linus was completely right to say "Yeah, that's not how we do allocations here." The only problem is thinking being a dick is a good way to lead a community.
I think some fantasize about being able to be a dick in a FOSS project just like Linus (which feels like "if only I was a strong man dictator"), and I think that's an absurd desire. The Linux kernel is sui generis. In no other area of the world can anyone act this way, and be productive.
Analysis of panic-safety in Rust is comparatively easy. The set of standard library calls that can panic is finite, so if your tool just walks every call graph you can figure out whether panic is disproven or not.
> According to the rust manual, when you make unsound libraries sound, common courtesy dictates you create a CVE for the old implementation.
I cannot find the words to describe how supercilious this is.
Getting logs out is critical.
- Only useful when actually being used, which is never the case. (Seriously, can we make at least ASAN the default?)
- Often costly to always turn them on (e.g. MSAN).
- Often requires restructuring or redesign to get the most out of them (especially fuzzing).
Rust's memory safety guarantee does not suffer from first two points, and the third point is largely amortized into the language learning cost.
The rules of arugment existed long before the linux kernel. You don't get to change terms introduced within a arugment with a clear meaning because it helps you create a strawman. If you want to change the definition of a term mid arugment, you telegraph it. Once again, this is called conflation.
Thats not the issue though. It's that "safe" means something is actualy safe. My house isn't safe if its on fire, even if the house is in a safe neighborhood. Linus' claim is that "rust people" sometimes themselves conflate memory saftey with general code saftey, simply because "safe" is in the name. So much so that they will at times sacrifice code quality to achieve this goal despite (a) memory saftey not being real saftey and (b) there is no way to guarantee memory saftey in the kernel anyway. What he is saying is that "rust people" (whatever that means) are at times trading off real saftey or real code maintenance/performance for "rust saftey."
>a compiler - or language infrastructure - that says "my rules are so ingrained that I cannot do that" is not one that is valid for kernel work.
And
>I think you are missing just how many things are "unsafe" in certain contexts and cannot be validated.
>This is not some kind of "a few special things".
>This is things like absolutely _anything_ that allocates memory, or takes a lock, or does a number of other things.
>Those things are simply not "safe" if you hold a spinlock, or if you are in a RCU read-locked region.
>And there is literally no way to check for it in certain configurations. None.
You can judge wheter he is correct but he never said rust's saftey implies absolute saftey, only that some rust users are treating it that way by sacrificing the code for it. If that's the case then it makes a lot of sense to start using a more sensible word like "guaranteed" instead of safe. I think part of what contibutes to this idea is that "unsafe" code is written with the keyword "unsafe" as if code written not that way is safe, and code written with "unsafe" is bad. That's not to say that "unsafe" actually implies any of that - all it means is that it's not guaranteed to be memory safe - but according to Linus it creates a certain mentality which is incongruent with the nature of kernel development. And the reason for that is that safe and unsafe are general english words with strong connotations such as:
>protected from or not exposed to danger or risk; not likely to be harmed or lost.
>uninjured; with no harm done.
And for unsafe:
>able or likely to cause harm, damage, or loss
But there is no need to let userspace processes continue to run, which is exactly what Linux does.
It feels hostile and intolerable to you.
There are many people who find the risk-averse non-confrontational corpspeak intolerable.
I would like to learn otherwise, but even a React JS+HTML page is undecidable... its scope is limited by chrome V8 js engine (like a vm), but within that scope I don't think you can prove anything more. otherwise we could just make static analysis to check if it will leak passwords...
But day to day programs are not trivial... as for your example, just switch it with this code: `print(gcd(user_input--,137))`... now it's quite more hard to "just ban some final states"
Things like "kernel error reporting" doesn't exist as discrete element. Sure, you might decide to stop everything and only dump log onto earlycon, but running with serial cable to every system that crashed would be rather annoying. For all kernel knows, the only way to get something to the outside world might be through USB Ethernet adapter and connection that is tunneled by userspace TUN device, at which point essentialy whole kernel must continue to run.
In this case it seems to be mutual. Even open hostility from some leading members of Zig community. Which is shame because these two languages could nicely coexist.
Trying to tweak the kernel to make integration easier in a supposed non-harmful way doesn't harm anything.
I'm not sure where you are getting this from the thread? Linus is using a userspace panic as an example. That's not something that is actually happening in the kernel?
Edit to add: My guess is that the Rust community might still be worse because now we have widespread Internet access and social media.
You, and the billions of people using linux, are free to fork the code and exclude Linus completley.
>It’s shocking that advocating for safe and inclusive work environments
This isn't a "work enviroment" in the way you seem to be implying. The vast majority of people contributing to the kernel do not work for linux or Linus.
Depending on the semantic property to check for, writing such an algorithm isn’t trivial. But the Rust compiler for example does it for memory safety, for the subset of valid Rust programs that don’t use Unsafe.
The only sure way I can think of, is when you force your program to go through a more narrow non-turing algorithm. Like sending data through a network after Serialization. Where we could limit the De-Serialization process to be non Turing (json, yaml?).
Same for code, that uses non-turing API, like memory allocation in a dedicated per process space. Or rust "borrow" mechanics that the compiler enforces.
But my point is, everyday program are "arbitrary program" and not a red haring. Surly from the kernel perspective, which is Linus point imo.
Rust users are generally friendly to one another, and to people who are interested in Rust. Hoever, some Rust users are toxic when talking to people outside the community or to people who disagree.
That's why a lot of us (in the Rust community) don't notice it; we spend most of the time inside our own community talking to each other and being friendly to each other.
This is a trait common to any bubble or insular community whether it be about politics, religion, economics, or whatever. It's fairly easy to recognize once you get in the habit of dis-identifying with your own side.
There's also a phenomenon in human psychology where we tend to forgive "our side's" misbehavior, presumably because it's in service to a higher ideal and therefore forgivable. It's the difference between "passionately spreading the good word" and "aggressive evangelism", two views of the same action. After learning about this I've even seen it in myself, though hopefully I've learned to counteract it a bit.
Note that this isn't unique to Rust, other languages have this too to an extent.
It's something I really hope we can leave behind, because it's hurting an otherwise beneficial message that Rust can bring a new tradeoff that is favorable for a lot of situations.
That works for some systems: those for which "some NVRAM or something" evaluates to a real device usable for that purpose. Not all Linux systems provide such a device.
> But there is no need to let userspace processes continue to run, which is exactly what Linux does.
Userspace processes usually contain state that the user would also like to be persisted before rebooting. If my WiFi driver crashes, there's nothing helpful or safer about immediately bringing down the whole system when it's possible to keep running with everything but networking still functioning.
Regarding the second question, in the general case you have to guess or think hard, and proceed by trial and error. You notice that the analyzer takes more time than you’re willing to wait, so you stop it and try to change your program in order to fix that problem.
We already have that situation today, because the Rust type system is turing-complete. Meaning, the Rust compiler may in principle need an infinite amount of time to type-check a program. Normally the types used in actual programs don’t trigger that situation (and the compiler also may first run out of memory).
By the way, even if Rust’s type system wasn’t turing-complete, the kind of type inference it uses takes exponential time, which in practice is almost the same as the possibility of non-halting cases, because you can’t afford to wait a hundred or more years for your program to finish compiling.
> But my point is, everyday program are "arbitrary program"
No, most programs we write are from a very limited subset of all possible programs. This is because we already reason in our heads about the validity and suitability of our programs.
The differences are they are actually meant to be used for exceptional situations ("assert violated => there's a bug in this program" or "out of memory, catastrophic runtime situation") and they are not typed (rather, the panic holds a type erased payload).
Other than that, it performs unwinding without UB, and is catchable[0]. I'm not seeing the technical difference?
[0]: https://doc.rust-lang.org/std/panic/fn.catch_unwind.html
Oh? How do you do that? Do you have a written guide handy? Very curious about this.
It's even worse on things like car dashboards: some warning lights on dashboards need to be ASIL-D conformant, which is quite strict. However, developing the whole dashboard software stack to that standard is too expensive. So the common solution these days is to have a safe, ASIL-D compliant compositor and a small renderer for the warning lights section of the display while the rendering for all the flashy graphics runs in an isolated VM on standard software with lower safety requirements. It's all done on the same CPU and GPU.
- use after free
- breaking the aliasing rules
- causing a "data race" (e.g. writing to the same value from multiple threads without a lock)
- producing an invalid value (like a bool that's not 0 or 1)
There's some other technical stuff like "calling a foreign function with the wrong ABI", but those four above capture most of what safe Rust wants to guarantee that you never do. I contrast, the same page provides an interesting list of things that Rust doesn't consider UB and that you can do in safe code, for example:
- deadlocks and other race conditions that aren't data races
- leak memory
- overflow an integer
- abort the whole process
If there isn't 100% safety then why bother, it is the usual argument for the last 40 years.
> You notice that the analyzer takes more time than you’re willing to wait,
I see, thanks, didn't know about this feedback loop as I'm not a rust programmer. Still on my todo list to learn.
The most obvious is mutable references. In Rust there can be either one mutable reference to an object or there may be any number of immutable references. So if we're thinking about this value here, V, and we're got an immutable reference &V so that we can examine V well... it's not changing, there are no mutable references to it by definition. The Rust language won't let us have &mut V the mutable reference at the same time &V exists and so it needn't ever account for that possibility†.
In C and C++ they break this rule all the time. It's really convenient, and it's not forbidden in C or C++ so why not. Well, now the analysis you wanted to do is incredibly difficult, so good luck with that.
† This also has drastic implications for an optimiser. Rust's optimiser can often trivially conclude that a= f(b); can't change b where a C or C++ optimiser is obliged to admit that actually it's not sure, we need to emit slower code in case b is just an alias for a.
There have been various examples of WiFi driver bugs leading to security issues. Didn’t some Broadcom WiFi driver once have a bug in how it processed non-ASCII SSID names, allowing you to trigger remote code execution?
I always been on the C++ side, when arguing on C vs C++ since 1993, already considered C a primitive option, coming from Turbo Pascal 6.0, and finding such a simplistic pseudo-macro assembler.
So yeah, in a sense the Rust community is similarly hyped as we were adopting Turbo Vision, CSet++, OWL, MFC, PowerPlant, Tools.h++, POET, and thinking C would slowly fade away, and we could just keep on using a language that while compatible with C, offered the necessary type system improvements for safer code.
But then the FOSS movement doubled down on C as means to write the GNU ecosystem, on the first editions of the GNU manifesto, and here we are.
I'm not even sure if this is the case. I have seen enough toxic Rust users, but at least in my experience they rarely overlap with who are active in the community. This suggests that they are experiencing typical newcomer syndrome, comparable to Haskell newcomers' urge to write a monad tutorial, and also explains that why a disproportional number of non-Rust users observe toxic Rust users---if you are a Rust user but don't preach about Rust everywhere, how can others tell if you are indeed a Rust user? :-)
Was Wedson acting in an untoward way here that in some way exemplifies something significant about the Rust community? No, not really. So, yeah, I think your comment above is a pointless low blow, cheap shot, an excuse to act nasty about some super annoying Rust comment you probably read months ago. And it just sounds like whining to me.
Mind you, that experience also severely soured me on the quality of medical software systems, due to poor quality of the software that ran in that distribution. Linux itself was a golden god in comparison to the crap that was layered on top of it.
Nevertheless a long living application like, e.g., a webserver will catch panics coming from its subtasks (e.g., its request handlers) via catch_unwind
I'm not familiar with kernel development in general or Linux in particular. I would have expected there to be an error reporting subsystem, so that if a given subsystem fails the failure is reported to the error reporting subsystem (which hopefully exposes a more modern interface than serial cable), but this might be naive on my part.
> For all kernel knows, the only way to get something to the outside world might be through USB Ethernet adapter and connection that is tunneled by userspace TUN device, at which point essentialy whole kernel must continue to run
Again I'm missing context on this discussion. For all I know this could be an error originating with a driver, since rust support for Linux is for driver development now. It would make sense to me that an error in the GPU driver doesn't prevent the ethernet driver to report the bug
If you need an example of the rust community being toxic, I give you https://github.com/actix/actix-web
Look up the history and realize they bullied an open source project leader into leaving open source for good.
Let's not be too pedantic. You, as an experienced medical device engineer, probably knew what I meant was that they would never use Linux in the critical parts of a medical device as the OP had originally argued. Any device would definitely do all of it's functionality without the part with Linux on it.
The OP was still a major strawman, regardless of my arguments, because the Linux kernel will never be in the critical path of a medical device without a TON of work to harden it from errors and such. Just the fact that Linus' stance is as said would mean that it's not an appropriate kernel for a medical device, because they should always fail with an error and stop under unknown conditions rather than just doing some random crap.
I still don't understand the relevance, this neither appears toxic nor to be a discussion of Rust; this looks like they put forward an out-there idea and you didn't care for it, which just seems like a discussion about consumer protection laws. I also don't see the connection from Actix drama to the idea that people are exaggerating the capabilities of Rust or causing problems for other language communities - I don't know much about it, I'm fully willing to believe toxicity was involved, but a breakdown in communication between a maintainer and their community doesn't seem like the behavior we're discussing and I don't see any evidence this was peculiar to Rust and not a phenomenon in open source at large.
I don't want to relitigate some thread I wasn't even a part of, I just don't understand.
Why people feel attacked by Linus words is a mystery to me.
This is false. "Safety" and "Liveness" are terms used by the PL field to describe precise properties of programs and they have been used this way for like 50 years (https://en.wikipedia.org/wiki/Safety_and_liveness_properties). A "safety" property describes a guarantee that a program will never reach some form of unwanted state. A "liveness" property describes a guarantee that a program will eventually reach some form of wanted state. These terms would be described very early in a PL course.
And that's pretty easy to statically analyze.
The point is that you can produce a perfectly working analysis method that is either sound or complete but not both. "Nowhere in the entire program does the call 'panic()' appear is a perfectly workable analysis - it just has false positives.
The problem is that this definition of safety is very arbitrary. Sometimes crashing a process can be safe (as in not causing serious problems) but sometimes not. Accessing an array out of bounds can be safe sometimes and sometimes not, and so on.
Rust says that here is a list of things that are always safe and here is a list of things that are always unsafe and then people want safety everywhere so they take that definition of safety to other contexts where it doesn't make sense, like the kernel.
(input state, input symbol) --> (output state, output symbol, move left/right)
This is all static, so you can look at the transition table to see all the possible output symbols. If no transition has output symbol 1, then it never outputs 1. It doesn't matter how big the Turing machine is or what input it gets, it won't do it. This is basically trivial, but it's still a type of very simple static analysis that you can do. Similarly, if you don't have any states that halt, the machine will never halt.This is like just not linking panic() into the program: it isn't going to be able to call it, no matter what else is in there.
If not, would you care to drop some links?
I'd say B is nearly always the better choice, because halting is a known state it's almost always possible to recover from, and going into unknown state may cause you to get hacked or to damage your peripherals. But if we were operating, say, a Mars rover, and shutting down meant we would never be able to boot again, then it'd be better take kernel A and attempt to recover from whatever state we find ourselves in. That's pretty exotic, however.
In the case of an unanticipated error in a software component, we always need input from an external source to correct ourselves. When you're the kernel, that generally means either a human being or a hypervisor has to correct you; better to do so from a halted state than an entirely unknown one. Trying to muddle through despite is super dangerous, and makes your software component into lava in the case of a fault.
> In the kernel, "panic and stop" is not an option
That's simply not true. It's an option I've seen exercised many times, even in default configurations. Furthermore, for some domains - e.g. storage - it's the only sane option. Continuing when the world is clearly crazy risks losing or corrupting data, and that's far worse than a crash. No, it's not weird to think all types of computation are ephemeral or less important than preserving the integrity of data. Especially in a distributed context, where this machine might be one of thousands which can cover for a transient loss of one component but letting it continue to run puts everything at risk, rebooting is clearly the better option. A system that can't survive such a reboot is broken. See also: Erlang OTP, Recovery Oriented Computing @ Berkeley.
Linus is right overall, but that particular argument is a very bad one. There are systems where "panic and stop" is not an option and there are systems where it's the only option.
The proper answer to those is redundancy, not continuing in an unknown and quite likely harmful state.
That you view it as exotic is partly a lack of imagination on your part; with a little more effort it's possible to identify similar use cases that are much closer to home than Mars.
But that doesn't really matter. What matters is that the Linux kernel needs to support both options, because it's just one component in a larger system and that context outside the kernel is what determines which option is correct for that system.
Can you elaborate on this? Because failing storage is a common occurrence that usually does not warrant immediately crashing the whole OS, unless it's the root filesystem that becomes inaccessible.
If you feel there are some that would add to this conversation, feel free to share them.
What are you quoting? I don't see this anywhere in the thread.
The nearest I see is:
If you cannot get over the fact that the kernel may have other
requirements that trump any language standards, we really can't work
together.
A reasonable, politely delivered, statement directed to an individual as opposed to Rust. It was in response to this rather cringy bit of lecturing: No one is talking about absolute safety guarantees. I am talking about
specific ones that Rust makes: these are well-documented and formally
defined.
Rust has no formal language specification yet. It's still "an area of research," to paraphrase what is said when the question is asked. No defined memory model either; from the current Rust reference: Rust does not yet have a defined memory model. Various academics
and industry professionals are working on various proposals, but
for now, this is an under-defined place in the language.
One could argue (not me; I'm far too pragmatic for such things) that Linus is being exceptionally generous in entertaining Rust in its current state.For context, this is OP's sentence that I responded to in particular. Ensuring safety [1] is way less trivial than looking for a call to "panic" in the state machine. You can remove the calls to "panic" and this alone does not make your program safer than the equivalent C code. It just makes it more kernel friendly.
[1] not only memory safety
There's a video of passengers doing this for real in this 2016 news article:
I was paraphrasing. I didn't want to write a page length comment, and won't here, but there were a few more instances of similar ultimatums (like "Or, you know, if you can't deal with the rules that the kernel requires, then just don't do kernel programming.") And all are similarly ridiculous/dickish. Really no need for such dramatic convulsions, Linus, where Wedson was simply trying to explain the API expectations of the Rust language.
Re: the rest, I think you are conflating Rust's UB guarantees with a specified memory model.
That's a false dichotomy, you don't get to choose between definitely crashing or maybe crashing. That would be nice but it's not on the menu. Crashing is just the best case scenario, so if you can make your system stop instead of being incorrect, that's great.
> but we all disable them in production (assertions)
We don't all do that.
I concede that it depends on the use case. You might not care if you got a single user non-networked gaming console for example. A bug could even become a welcomed part of the experience there. I hope these cases are more rare than not though.
So just change that assumption since for these edge cases that is an incorrect assumption.
"Performance" is a red herring. In a safety-critical system, what matters is the behaviour and the consistency. ThreadX provides timing guarantees which Linux can not, and all of the system threads are executed in strict priority order. It works extremely well, and the result is a system for which one can can understand the behaviour exactly, which is important for validating that it is functioning correctly. Simplicity equates to reliability. It doesn't matter if it's "slow" so long as it's consistently slow. If it meets the product requirements, then it's fine. And when you do the board design, you'll pick a part appropriate to the task at hand to meet the timing requirements.
Anyway, systems like ThreadX provide safety guarantees that Linux will never be able to. But the interface is not POSIX. And for dedicated applications that's OK. It's not a general-purpose OS, and that's OK too. There are good reasons not to use complex general-purpose kernels in safety-critical systems.
IEC 62304 and ISO 13485 are serious standards for serious applications, where faults can be life-critical. You wouldn't use Linux in this context. No matter how much we might like Linux, you wouldn't entrust your life to it, would you? Anyone who answered "yes" to that rhetorical question should not be trusted with writing safety-critical applications. Linux is too big and complex to fully understand and reason about, and as a result impossible to validate properly in good faith. You might use it in an ancillary system in a non-safety-critical context, but you wouldn't use it anywhere where safety really mattered. IEC 62304 is all about hazards and risks, and risk mitigation. You can't mitigate risks you can't fully reason about, and any given release of Linux has hundreds of silly bugs in it on top of very complex behaviours we can't fully understand either even if they are correct.
Especially in a distributed storage system using erasure codes etc., losing one machine means absolutely nothing even if it's permanent. On the last storage project I worked on, we routinely ran with 1-5% of machines down, whether it was due to failures or various kinds of maintenance actions, and all it meant was a loss of some capacity/performance. It's what the system was designed for. Leaving a faulty machine running, OTOH, could have led to a Byzantine failure mode corrupting all shards for a block and thus losing its contents forever.
BTW, in that sort of context - where most bytes in the world are held BTW - the root filesystem is more expendable than any other. It's just part of the access system, much like firmware, and re-imaging or even hardware replacement doesn't affect the real persistence layer. It's user data that must be king, and those media whose contents must be treated with the utmost care.
For clarification, I responded to this in particular because "safety" is being conflated with "panicking" (bad for kernel). I reckoned "Unexpected conditions" means "arbitrary programs", hence my response, otherwise you could just remove the call to panic.
You put it in quotes and didn't mention any paraphrasing. Linus didn't write it.
> Rust's UB guarantees
Can you point out the normative document that provides these guarantees? Rust doesn't have one as far as I know.
I think it's a fair characterization of what was said. Feel free, as everyone is, to read the entire thread again. I'm not a journalist. You have the primary source at your finger tips!
> Can you point out the normative document that provides these guarantees?
You're looking at the Rust reference right? https://doc.rust-lang.org/reference/behavior-considered-unde...
Not normative, as stated here[1], linked from the page you cite.
I think inventing Linus quotes is unfair.
Pointing out whatever those are is fine. Linus pointing out the expectations of the Linux kernel is fine too, and no amount of invoking fictional formalisms trumps them.
For better or worse Linux is NOT a microkernel. Therefore, the sound microkernel wisdom is not applicable to Linux in its present form. The "impedance match" of any new language added to the linux kernel is driven by what current kernel code in C is doing. This is essentially linux kernel limitation. If Rust cannot adapt to these requirements it is a mismatch for linux kernel development. For the other kernels like Fuchsia Rust is a good fit. BTW, core Fuchsia kernel itself is still in C++.
Even in a distributed, fault-tolerant multi-node system, it seems like it would be useful for the kernel to keep running long enough for userspace to notify other systems of the failure (eg. return errors to clients with pending requests so they don't have to wait for a timeout to try retrieving data from a different node) or at least send logs to where ever you're aggregating them.
Neither QNX nor ThreadX are intended to be general purpose kernel. I haven’t looked into it for a long time but QNX performances used to not be very good. It’s small. It can boot fast. It gives you guaranty regarding time of return. Everything you want from a RTOS in a safety critical environment. It’s not very fast however which is why it never tried to move towards the general market.
Since the mechanisms for ensuring the orderly stoppage of all such activity system-wide are themselves complicated and possibly error-prone, and more importantly not present in a commodity OS such as Linux, the safe option is "opt in" rather than "opt out". In other words, don't try to say you must stop X and Y and Z ad infinitum. Instead say you may only do A and B and nothing else. That can easily be accomplished with a panic, where certain parts such as dmesg are specifically enabled between the panic() call and the final halt instruction. Making that window bigger, e.g. to return errors to clients who don't really need them, only creates further potential for destructive activity to occur, and IMO is best avoided.
Note that this is a fundamental difference between a user (compute-centric) view of software and a systems/infra view. It's actually the point Linus was trying to get across, even if he picked a horrible example. What's arguably better in one domain might be professional malfeasance in the other. Given the many ways Linux is used, saying that "stopping is not an option" is silly, and "continuing is not an option" would be equally so. My point is not that what's true for my domain must be true for others, but that both really are and must remain options.
P.S. No, stopping userspace is not stopping everything, and not what I was talking about. Or what you were talking about until the narrowing became convenient. Your reply is a non sequitur. Also, I can see from other comments that you already agree with points I have made from the start - e.g. that both must remain options, that the choice depends on the system as a whole. Why badger so much, then? Why equivocate on the importance (or even meaningful difference) between kernel vs. userspace? Heightening conflict for its own sake isn't what this site is supposed to be about.
And the *reality* is that there are no absolute guarantees. Ever. The "Rust is safe" is not some kind of absolute guarantee of code safety. Never has been. Anybody who believes that should probably re-take their kindergarten year, and stop believing in the Easter bunny and Santa Claus.
This is needlessly talking down to competent developers as if they are deluded children. It's also not the only instance of it in the linked message. He would be far better off just going straight into the technical differences between what he is willing to permit in his kernel vs. what the Rust-oriented developers seek.But why are you complaining that a group of people who don't want to work in your default environment went off and created their own?
I don't understand what you have to complain about: they have their way of working and you want to change that because it offends you?
Sounds like you're the problem, not them.
I don't suppose monitors report calibration data back to display adapters do they?
We're talking specifically about the current meaning of a Linux kernel panic. That means an immediate halt to all of userspace.
Source: aerospace engineer with a flight sciences background, and also software reviewer for flight systems.
https://smile.amazon.com/EVanlak-Passthrough-Generrtion-Elim...
So you prefer a system completely unusable than a system that may be used, but with some errors? If you prefer the first, you will not be able to use practically nothing. If you look at the `dmesg` output of a running Linux system you can find a lot of errors, that even if a single one of them was turned into a panic, your computer would not even be able to boot.
Nothing is perfect, and errors will appear. Ideally errors should be handled at the lowest possible level, but if unhandled to me errors should not result in a complete system crash.
> We don't all do that.
I do that. Reason is that not doing that in my use case would not only render completely unusable the product, but not even upgradable with an over the air firmware update. So better that the system will continue running than it crashing (and then rebooting).
And the *reality* is that there are no absolute guarantees. Ever. The "Rust is safe" is not some kind of absolute guarantee of code safety. Never has been. Anybody who believes that should probably re-take their kindergarten year, and stop believing in the Easter bunny and Santa Claus.
What's going on here is not "woke people" trying to protect every little snowflake's feelings, rather it's noting the ranter is making himself feel good at others' expense with no other value added. His rants are completely superfluous to the substantive technical dialogue.When your wifi driver crashes yet again, would you choose to discard all unsaved files open in your editor, just on the very unlikely possibility that they're corrupted now?
In the context of Rust, there are a number of safety properties that Rust guarantees (modulo unsafe, FFI UB, etc.), but that set of safety properties is specific to Rust and not universal. For example, Java has a different set of safety properties, e.g. its memory model gives stronger guarantees than Rust’s.
Therefore, the meaning of “language X is safe” is entirely dependent on the specific language, and can only be understood by explicitly specifying its safety properties.
Small rant. ARM cortex processors overwrites the stack pointer on reset. That's very very very dumb because after the watchdog trips you have no idea what the code was doing. Which means you can't report what the code was doing when that happened.
1) needing to reload a wifi driver to reinitialize hardware (with a tiny probability of memory corruption) OR choosing to reboot as soon as convenient (with a tiny probability of corrupting the latest saved files)
2) to lose unsaved files for sure and not even know what caused the crash
You're not wrong but you chose a hilarious example. Unwrap's entire purpose is to turn unhandled errors into panics!
Array indexing, arithmetic (with overflow-checks enabled), and slicing are examples where it's not so obvious there be panic dragons. Library code does sometimes panic in cases of truly unrecoverable errors also.
Like “memory safety”?
Accepting the idea that rust guarantees aren't necessarily always good is needed to accept the idea that those guarantees might need to be relaxed, or at least don't necessarily justify linux kernel changes.
For practically all non-virtualized Linux hosts out there, the kernel crash dump mechanism works by adding ASCII text to kmesg, which is then read by journald, processed a little, and appended to a file -- which just means submitted back to the kernel for writing, which means FS needs to work, disk I/O needs to work, and so on.
Rust could do the external tooling better than any other language out there, but they're so focused on the _language_ preventing abuse that they've largely missed the boat.
Almost all discussion about Rust is in comparison to C and C++, by far the dominant languages for developing native applications. C and C++ are famously neither type-safe nor memory-safe and it becomes a pretty easy shorthand in discussions of Rust for "safety" to refer to these properties.
Real engineers, like say the people who code the machines that fly in mars, don't want "oops that's unexpected, ruin the entire mission because that's safer". Same for the Linux kernel.
Just as the parent comment generalized about the two kinds of people out there, I added other examples of generalizations about people. But that’s all they are, generalizations. Not specific examples.
That's not the right way to characterize this. Rust has unsafe for code that is correct but that the compiler is unable to detect. Foreign memory access (or hardware MMIO) and cyclic data structures are the big ones, and those are well-specified, provable, verifiable regimes. They just don't fit within the borrow checker's world view.
Which is something I think a lot of Rust folks tend to gloss over: even at it's best, most maximalist interpretation, Rust can only verify "correctness" along axes it understands, and those aren't really that big a part of the problem area in practice.
As a manager, if I had a report who exhibited this level of verbal aggression, we would have a talk, and if it happened again, we'd be going through HR. It's not acceptable, regardless of technical merit.
At the end of the day, what Linux does is what Linus wants out of it. He's stated, often, that halting the CPU at the exact moment something goes wrong is not the goal. If your goal is to do that, you might not be able to use Linux. If your goal is to put Rust in the Linux kernel, you might have to let go of your goal.
But ok, uninformed me would have guessed checking for that would be pretty straightforward in statically typed Rust. Is that something people want? Why isn't there a built-in mechanism to do it?
And I thought it was clear that kernel panic is different from Rust panic, which you don't seem to distinguish. Rust panic doesn't need to cause a kernel panic because it can be caught earlier.
My understanding is that negative votes is for things that don’t contribute to discussion, yet all my comments are in the negatives except when I mentioned I actually am using rust. Then suddenly the commenter stops talking about our discussion all together and starts to mention learning rust.
It’s frustrating because I like rust, but I can’t seem to criticize it in the slightest.
After saying everyone was empowered to use their tool, they tried to kick someone off the team for working for Palantir.
Regardless of politics, kinda unfair to make political statements using the rust accounts, then turn around and say other people can’t be part of rust because they work for a company who is political.
Edit: I should also add (probably earlier too) that all my examples are specific to the USA FDA process. I'm sure some other place might not have the same rules.
I hope it's rare, but I think a persistent nag window ("Your display isn't calibrated and may not be accurate") is probably a better answer than refusing to work altogether, because it will be clear about the source of the problem and less likely to get nailed down.
I really have a hard time understanding how anyone could possibly think that's okay.
It sounds like the kernel's quality is so poor that UB is commonplace and even expected at this point. Pretty scary how many systems are relying on this huge pile of broken C code to hopefully only slightly corrupt itself and your system.
I'm not even sure how useful Rust in the kernel is going to be considering they want it to just ignore errors. You can't even have bounds checking on arrays because invalid accesses might be detected at runtime and cause an error, which is totally insane.
This reminds me of two things. Good system design needs a hardware-software codesign. Oxide computers has identified this, but it was probably much more common before the 90ies than after. The second thing is that all things can fail so a strategy that only hardens the one component is fundamentally limited, even flawed. If the component must not fail you need redundancy and supervision. Joe Armstrong would be my source of quote if I needed to find one.
Both rust and Linux has some potential for improvement here, but the best answers may lie in their relation to the greater system, rather than within it self. I’m thinking of WASM and hardware codesign respectively.
Allow me to introduce you to Therac-25: https://en.wikipedia.org/wiki/Therac-25
Safety critical systems will try to recover to a working state as much as possible. It is designed with redundancy that if one path fails, it can use path 2 or path 3 towards a safe usable state.
I'm mostly familiar with EU rules, but as far as I know the FDA regulations follow the same idea of tiered requirements based on potential harm done.
The idea that rust guarantees aren't necessarily always good is completely orthogonal to a condescending diatribe about how "there are no absolute guarantees", and totally needlessly connected to "go back to kindergarten" and "stop believing in Santa" BS.
"I guarantee I will be there" - but I could always be hit by a bus, or have a very serious family issue to tend to, or an earthquake might happen, or the airports might be closed due to Covid and so on.
"Our bank guarantees your money" - yeah, except if the global economy collapses, or the country is hit by an asteroid, or if there's martial law, and so on.
The trivial such cases are irrelevant to the guarantees they want to offer (and same for Rust), and it's a bad move to point to them and consider them as part of his argument.
Not to mention they're saying "trying to", not guaranteeing in the first place. Which acknowledges things like possible bugs, or some edge case not handled, etc.
There's a clear safety spectrum, with C near the bottom and Rust near the top. It's tedious for people to keep saying "well it's not right at the top so we should just keep using C".
I'm sure pro-seatbelt people were called "zealots" back in the day too.
"If you want to allocate memory, and you don't want to care about what context you are in, or whether you are holding spinlocks etc, then you damn well shouldn't be doing kernel programming. Not in C, and not in Rust.
It really is that simple. Contexts like this ("I am in a critical region, I must not do memory allocation or use sleeping locks") is fundamental to kernel programming. It has nothing to do with the language, and everything to do with the problem space."
Rust proponents mean exactly "memory safety" when they say rust is safe because that is the only safety rust guarantees.
This is good for when the things you are using could error, e.g. when you use an arbitrary unicode string as a filename you might get an error because depending on the OS there might be characters that you cannot use as filenames that are valid unicode (or the other way around, possible filenames that are not valid unicode).
In most programming languages this is something you need to know to catch it. In Rust this is an Error that you can or cannot handle. But you can't forget to deal with it.
https://www.fda.gov/medical-devices/human-factors-and-medica...
Honestly, the FDA regulations go too far vs the EU regs. The company I worked for was based in the EU and the products there were so advanced compared to our versions. Ours were all based on an original design from Europe that was approved and then basically didn’t charge for 30 years. The European device was fucking cool and had so many features, it was also capable of being carried around rather than rolled. The manufacturing was almost all automated, too, but in the USA it was not at all automated, it was humans assembling parts then recording it in a computer terminal.
This is configurable, by default with optimizations on math overflow doesn't panic in rust, it wraps around.
Obviously the kernel won't enable panics here unless in debug mode.
Well, I don't go around pointing out how random groups, formed by like-minded people voluntarily, are doing collaboration "wrong".
If I did, on some random internet forum, complain that the local Street Rod Enthusiasts Club[1] doesn't do proper agendas for their meetings, or that a book-reading club[1] that I know off isn't properly structured, or that the volunteer SPCA group is using the wrong IM/Chat software to communicate .. well, then I'm the problem.
[1] That I have no intention of joining
The first priority is safety, absolutely and without question. And then the immediate second priority is the fact that time is money. For every minute that the system is not operating, x amount of product is not being produced.
Generally, having the software fully halt on error is both dangerous and time-consuming.
Instead you want to switch to an ERROR and/or EMERGENCY_STOP state, where things like lasers or plasma torches get turned off, motors are stopped, brakes are applied, doors get locked/unlocked (as appropriate/safe), etc. And then you want to report that to the user, and give them tools to diagnose and correct the source of the error and to restart the machine/line [safely!] as quickly as possible.
In short, error handling and recovery is its own entire thing, and tends to be something that gets tested for separately during commissioning.
[1] PLC's do have the ability to <not stop> and execute code in a real time manner, but I haven't encountered a lot of PLC programmers who actually exploit these abilities effectively. Basically for more complex situations you're quickly going to be better off with more general purpose tools [2], at most handing off critical tasks to PLCs, micro-controllers, or motor controllers etc.
[2] except for that stupid propensity to give-up-and-halt at exactly that moment where it'll cause the most damage.
It wasn't my example. It was mike_hock's, and I was responding in the context they had set.
> Most Linuxes aren't like that.
Your ally picked the medical-device and space-life-support examples. If you think they're invalid because such systems don't use Linux, why did you forego bringing it up with them and then change course when replying to me? As I said: not helpful.
The point is not specific to Linux, and more Linux systems than you seem to be aware of do adopt the "crash before doing more damage" approach because they have some redundancy. If you're truly interested, I had another whole thread in this discussion explaining one class of such cases in what I feel was a reasonably informative and respectful way while another bad-faith interlocutor threw out little more than one-liners.
As I've said over and over, both approaches - "limp along" and "reboot before causing harm" - need to remain options, for different scenarios. Anyone who treats the one use case they're familiar with as the only one which should drive policy for everyone is doing the community a disservice.
Felt kinda bad until I thought about how well a "Linux literally killed me" headline would do on HN, but then I realized I wouldn't be able to post the article if I actually died. Such is life. Or death? One or the other.
And continuing on parent’s comment, rust can only make its memory guarantees by restricting the set of programmable programs, while C and the like’s static analysis has to work on the whole set which is simply an undecidable problem. As soon as unsafe is in the picture, it becomes undecidable as well in Rust, in general.
The other half is that kernel has a lot of rules of what is safe to be done where, and Rust has to be able to follow those rules, or not be used in those contexts. This is the GFP_ATOMIC part.
I think that is untrue. I worked at the Network Systems arm of Bell Labs for sixteen years, and we could demonstrate five-nines of uptime on complex systems written entirely in C.
C is a rough tool, I will grant you that, and Rust is a welcome addition to the toolkit, but saying that most code written in C is horribly unsafe, does not make it true.
But... this isn't true??
> I'm sure pro-seatbelt people were called "zealots" back in the day too.
Given that vehicles were grandfathered in, pro-seatbelt people were irrelevant to owners & drivers of said vehicle.
Just like some rust zealot asserting that some existing project with millions of lines of code should be rewritten in rust is irrelevant to the project maintainers.
https://www.acsac.org/2002/papers/classic-multics.pdf
So not even back then.
If we are talking about products like PC-lint, Sonar qube, Coverity, the experience is much more than that.
I just took a break from creating measurable commercial value in Haskell.
Grab a Starbucks, shop at Target, or use Facebook recently?
Congrats, you used production Haskell code delivering measurable commercial value to you and millions of others.
See my comment upthread, you seem to be misinformed on the use and prevalence of Haskell in the real world.
Also I had to laugh at this:
> No one is talking about absolute safety guarantees. I am talking about specific ones that Rust makes: these are well-documented and formally defined
As the saying goes "name three".
1. Coming from C++, my productivity is x2-x3 in Rust, making Rust a middle point between C++ and Python (about x8 productivity). What's more, if we factor maintenance time in, the lower costs of maintenance of Rust code makes the multiplier tend to x10, which is equal or better than Python (whose maintenance costs are important).
2. I have a colleague coming from Python (so a very different background than my C++ background), and he doesn't "get lost in the complexity of Rust" but after some use of Rust makes pretty much the same conclusions as I do: initial coding slower than Python, but roughly equal when you factor in maintenance time. He now writes the quick tools that could be Python scripts in the past in Rust when we suspect that they won't be one-off scripts (which happens very often). We get ease of distribution (static binaries), portability (to Linux and Windows), and better performance out of it too.
Although this is a comparison with C++ and Python, not C, the reasons why are simple and apply equally so to C:
1. Easy access to a good ecosystem. Adding dependencies in C or C++ is a pain. Very easy to do in Rust, preventing the need of reinventing the wheel (squarely). C suffers even more from this, given its lack of standard library and data structures (everything is a linked list :-D)
2. Memory safety and lack of UB in safe Rust brings a definitive simplicity in coding, code review and debug.
3. Result-oriented APIs and generally expressive type system are what end-up bridging the gap with Python with time.
What Rust definitely has is a learning curve. It is not optimized for taking the language without deep diving into it, or learning it in a short time. IMO it is a reasonable trade-off, given that the experience past the learning curve is so good, and that many of the things that make the learning curve so steep are integral to that experience (exclusive borrows, trait system, ...).
Am I?
You suppose a lot of things about me from literaly a bunch of words.
"A 'tiny probability of memory corruption' can easily become a CVE" is still FUD, because is simply not true in most cases. The words "tiny" and "easily" show the bias here.
The rest of the conversation seems a symptom of Hypervigilance: Fixation on potential threats (dangerous people, animals, or situations).
Fortunately, the decision isn't up to you either.
TL;DR: Let's see what happens when average C programers are forced to use Rust. Will their code be more secure? I see no convincing arguments one way or the other. Only measuring XYZs.