zlacker

[parent] [thread] 20 comments
1. blinki+(OP)[view] [source] 2022-10-02 15:33:52
> Even "safe" rust code in user space will do things like panic when things go wrong (overflows, allocation failures, etc). If you don't realize that that is NOT some kind of true safely[sic], I don't know what to say.

> Not completing the operation at all, is not really any better than getting the wrong answer, it's only more debuggable.

What Linus is saying is 100% right of course - he is trying to set the expectations straight in saying that just because you replaced C code with multi thousands (or whatever huge number) of man months of efforts, corrections and refinements with Rust code it doesn't mean absolute safety is guaranteed. For him as a kernel guy just as when you double free the kernel C code detects it and warns about it Rust will panic abort on overflows/alloc fails etc. To the kernel that is not safety at all - as he points out it is only more debuggable.

He is allowing Rust in the kernel so he understands the fact that Rust allows you to shoot yourself in the foot a lot less than standard C - he is merely pointing out the reality that in kernel space or even user space that does not equate to absolute total safety. And as a chief kernel maintainer he is well within his rights to set the expectation straight that tomorrow's kernel-rust programmers write code with this point in mind.

(IOW as an example he doesn't want to see patches in Rust code that ignore kernel realities for Rust's magical safety guarantee - directly or indirectly allocating large chunks of memory may always fail in the kernel and would need to be accounted for even in Rust code.)

replies(3): >>swingl+X2 >>titzer+r4 >>lake_v+de
2. swingl+X2[view] [source] 2022-10-02 15:49:43
>>blinki+(OP)
At least in user space, aborting an operation is much better than incorrect results. But the kernel being incorrect makes user space incorrect as well.

First of all, making a problem both obvious and easier to solve is better. Nothing "only" about it - it's better. Better both for the programmers and for the users. For the programmer the benefit is obvious, for the user problems will simply be more rare, because the benefit the programmer received will make software better faster.

Second, about the behavior. When you attempt to save changes to your document, would you rather have the corruption of your document due to a bug fail with fanfare or succeed silently? How about the web page you visited with embedded malicious JavaScript from a compromised third party, would you rather the web page closed or have your bank details for sale on a foreign forum? When correctness is out the window, you must abort.

replies(5): >>alerig+v4 >>skybri+e7 >>evouga+u9 >>snovv_+Ba >>yencab+4X
3. titzer+r4[view] [source] 2022-10-02 15:59:09
>>blinki+(OP)
If that's what Linus is saying, then he needs to work on his communication skills, because that is not what he said. What he actually said is that dynamic errors should not be detected, they should be ignored. That's so antiquated and ignorant that I hope that he meant what you said, but it's definitely not what he wrote.

As I posted up in this thread, the right way to handle this is to make dynamic errors either throw exceptions or kill the whole task, and split the critical work into tasks that can be as-a-whole failed or completed, almost like transactions. The idea that the kernel should just go on limping in a f'd up state is bonkers.

replies(1): >>aspace+l5
◧◩
4. alerig+v4[view] [source] [discussion] 2022-10-02 15:59:17
>>swingl+X2
> Aborting an operation is much better than incorrect results.

Depends. Is a kernel panic better than something acting wrongly? I prefer my kernel not to panic, at the expense of some error somewhere that may or may not crash my system.

If you look at the output of `dmesg` on any Linux system you often will see errors even in a perfectly working system. This is because programs of that size are by definition not perfect, there are bugs, the hardware itself has bugs, thus you want the system to keep running even if something is not working 100% right. Most of the time you will not even notice it.

> First of all, making a problem both obvious and easier to solve is better.

It's the same with assertions: useful for debugging, but we all disable them in production, when the program is not in the hands of a developer but of the customer, since for a customer a system that crashes completely is worse than a system that has some bugs somewhere.

replies(2): >>jjnoak+W7 >>swingl+jI
◧◩
5. aspace+l5[view] [source] [discussion] 2022-10-02 16:04:00
>>titzer+r4
> it's definitely not what he wrote.

I feel like we must have read two different articles. You sound crazy. Didn't read it your way at all.

> Think of that "debugging tools give a huge warning" as being the equivalent of std::panic in standard rust. Yes, the kernel will continue (unless you have panic-on-warn set), because the kernel MUST continue in order for that "report to upstream" to have a chance of happening.

"If the kernel shuts down the world, we don't get the bug report", seems like a pretty good argument. There are two options when you hit a panic in rust code:

* Panic and shut it all down. This prevents any reporting mechanism like a core dump. You cannot attach a normal debugger to the kernel.

* Ignore the panic and proceed with the information it failed, reporting this failure later.

The kernel is a single program, so it's not like you could just fork it before every Rust call and fail if they fail.

replies(2): >>titzer+u5 >>titzer+J6
◧◩◪
6. titzer+u5[view] [source] [discussion] 2022-10-02 16:05:10
>>aspace+l5
He wrote:

> In the kernel, "panic and stop" is not an option (it's actively worse than even the wrong answer, since it's really not debugable), so the kernel version of "panic" is "WARN_ON_ONCE()" and continue with the wrong answer.

(edit, and):

> Yes, the kernel will continue (unless you have panic-on-warn set), because the kernel MUST continue in order for that "report to upstream" to have a chance of happening.

Did I read that right? The kernel must continue? Yes, sure, absolutely...but maybe it doesn't need to continue with the next instruction, but maybe in an error handler? Is his thinking so narrow? I hope not.

replies(2): >>jstimp+xd >>gmueck+Cg
◧◩◪
7. titzer+J6[view] [source] [discussion] 2022-10-02 16:11:11
>>aspace+l5
Well, you've edited your reply a couple times, so it's a moving target, but:

> * Panic and shut it all down. This prevents any reporting mechanism like a core dump. You cannot attach a normal debugger to the kernel.

No one is really advocating that. Clearly you need to be able to write code that fails at a smaller granularity than the whole kernel. See my comment upthread about what I mean by that: dynamic errors fail smaller granularity tasks and handlers deal with tasks failing due to safety checks going bad.

replies(1): >>aspace+z9
◧◩
8. skybri+e7[view] [source] [discussion] 2022-10-02 16:14:01
>>swingl+X2
Yes, aborting an operation is usually better assuming you have some mechanism to do it safely. In the Linux kernel, apparently you often don't?

Although, often in embedded programming, a watchdog that resets the board can be the right thing to do. (As long as you don't get a boot loop.)

◧◩◪
9. jjnoak+W7[view] [source] [discussion] 2022-10-02 16:18:00
>>alerig+v4
> for a customer a system that crashes completely is worse than a system that has some bugs somewhere

This entirely depends on the industry and the customer. My team leaves asserts on in production code because our customers want aborts over silent misbehavior.

It is an order of magnitude cheaper for them if things fail loudly and they get a fix when compared to them tracking down quiet issues hours, days, or even months after the fact.

◧◩
10. evouga+u9[view] [source] [discussion] 2022-10-02 16:25:33
>>swingl+X2
Saving a document is a great example: I would much rather that the kernel corrupt 20% of my unsaved work on a document (with a warning about the corruption), than crash and delete 100% of it.
◧◩◪◨
11. aspace+z9[view] [source] [discussion] 2022-10-02 16:25:56
>>titzer+J6
Ease the snark space ranger.

> dynamic errors fail smaller granularity tasks and handlers deal with tasks failing due to safety checks going bad.

Yes and that's why Rust is bad here (but it doesn't have to be). Rust _forces_ you to stop the whole world when an error occurs. You cannot fail at a smaller granularity. You have to panic. Period. This is why it is being criticized here. It doesn't allow you any other granularity. The top comment has some alternatives that still work in Rust.

replies(2): >>titzer+ca >>__jem+VY2
◧◩◪◨⬒
12. titzer+ca[view] [source] [discussion] 2022-10-02 16:29:07
>>aspace+z9
> You cannot fail at a smaller granularity.

Rust needs to fix that then. So we agree on that.

replies(1): >>Jweb_G+l01
◧◩
13. snovv_+Ba[view] [source] [discussion] 2022-10-02 16:30:43
>>swingl+X2
It depends if you care more about correctness of this one single component, relative to uptime of the entire system.

A panic caused by the formatting in a rarely used log output taking down all of a large company's NTP servers simultaneously, for example, would not be seen as a reasonable tradeoff.

◧◩◪◨
14. jstimp+xd[view] [source] [discussion] 2022-10-02 16:44:59
>>titzer+u5
In the case of WARN() macros, it will be continued with whatever the code says. There is no automatic stack unwinding in the kernel, and how errors should be handled (apart from being logged) must be decided case-by-case. It could just be handled with an early-exit returning an error code, like other "more expected" errors.

The issue being discussed here is that Rust comes from a perspective of being able to classify errors and being able to automate error handling. In the kernel, it doesn't work like that, as we're working with more constraints than in userland. That includes hardware that doesn't behave like it was expected to.

15. lake_v+de[view] [source] 2022-10-02 16:49:21
>>blinki+(OP)
Great explanation. I am not an expert on this, so your comment helped me understand. It sounds like Linus is just being a good kernel maintainer here, and clarifying a misunderstood technical term - safety.

It's not a condemnation of rust, but rather a guidepost that, if followed, will actually benefit rust developers.

◧◩◪◨
16. gmueck+Cg[view] [source] [discussion] 2022-10-02 17:02:31
>>titzer+u5
The error handler is the kernel. Whatever code runs to dump the panic somewhere must rely on some sort of device driver, which in turn must depend on other kernel subsystems and possibly other drivers to work.

There is an enormous variation in output targets for a panic on Linux: graphics hardware attached to PCIe (requires graphics driver and possibly support from PCIe bus master, I don't know), serial interface (USART driver), serial via USB (serial over USB driver, USB protocol stack, USB root hub driver, whatever bus that is attached to)... There is a very real chance that the error reporting ends up encountering the same issue (e.g. some inconsistent data on the kernel heap) while reporting it, Which would leave the developers with no information to work from if the kernel traps itself in an endless error handling loop.

◧◩◪
17. swingl+jI[view] [source] [discussion] 2022-10-02 19:53:08
>>alerig+v4
> Depends. Is a kernel panic better than something acting wrongly? I prefer my kernel not to panic, at the expense of some error somewhere that may or may not crash my system.

That's a false dichotomy, you don't get to choose between definitely crashing or maybe crashing. That would be nice but it's not on the menu. Crashing is just the best case scenario, so if you can make your system stop instead of being incorrect, that's great.

> but we all disable them in production (assertions)

We don't all do that.

I concede that it depends on the use case. You might not care if you got a single user non-networked gaming console for example. A bug could even become a welcomed part of the experience there. I hope these cases are more rare than not though.

replies(1): >>alerig+5W
◧◩◪◨
18. alerig+5W[view] [source] [discussion] 2022-10-02 21:19:49
>>swingl+jI
> That's a false dichotomy, you don't get to choose between definitely crashing or maybe crashing. That would be nice but it's not on the menu. Crashing is just the best case scenario, so if you can make your system stop instead of being incorrect, that's great.

So you prefer a system completely unusable than a system that may be used, but with some errors? If you prefer the first, you will not be able to use practically nothing. If you look at the `dmesg` output of a running Linux system you can find a lot of errors, that even if a single one of them was turned into a panic, your computer would not even be able to boot.

Nothing is perfect, and errors will appear. Ideally errors should be handled at the lowest possible level, but if unhandled to me errors should not result in a complete system crash.

> We don't all do that.

I do that. Reason is that not doing that in my use case would not only render completely unusable the product, but not even upgradable with an over the air firmware update. So better that the system will continue running than it crashing (and then rebooting).

◧◩
19. yencab+4X[view] [source] [discussion] 2022-10-02 21:27:52
>>swingl+X2
> When you attempt to save changes to your document, would you rather have the corruption of your document due to a bug fail with fanfare or succeed silently?

When your wifi driver crashes yet again, would you choose to discard all unsaved files open in your editor, just on the very unlikely possibility that they're corrupted now?

◧◩◪◨⬒⬓
20. Jweb_G+l01[view] [source] [discussion] 2022-10-02 21:52:39
>>titzer+ca
What was said is not actually true of Rust.
◧◩◪◨⬒
21. __jem+VY2[view] [source] [discussion] 2022-10-03 14:49:25
>>aspace+z9
> Rust _forces_ you to stop the whole world when an error occurs.

But... this isn't true??

[go to top]