zlacker

[parent] [thread] 8 comments
1. swingl+(OP)[view] [source] 2022-10-02 15:49:43
At least in user space, aborting an operation is much better than incorrect results. But the kernel being incorrect makes user space incorrect as well.

First of all, making a problem both obvious and easier to solve is better. Nothing "only" about it - it's better. Better both for the programmers and for the users. For the programmer the benefit is obvious, for the user problems will simply be more rare, because the benefit the programmer received will make software better faster.

Second, about the behavior. When you attempt to save changes to your document, would you rather have the corruption of your document due to a bug fail with fanfare or succeed silently? How about the web page you visited with embedded malicious JavaScript from a compromised third party, would you rather the web page closed or have your bank details for sale on a foreign forum? When correctness is out the window, you must abort.

replies(5): >>alerig+y1 >>skybri+h4 >>evouga+x6 >>snovv_+E7 >>yencab+7U
2. alerig+y1[view] [source] 2022-10-02 15:59:17
>>swingl+(OP)
> Aborting an operation is much better than incorrect results.

Depends. Is a kernel panic better than something acting wrongly? I prefer my kernel not to panic, at the expense of some error somewhere that may or may not crash my system.

If you look at the output of `dmesg` on any Linux system you often will see errors even in a perfectly working system. This is because programs of that size are by definition not perfect, there are bugs, the hardware itself has bugs, thus you want the system to keep running even if something is not working 100% right. Most of the time you will not even notice it.

> First of all, making a problem both obvious and easier to solve is better.

It's the same with assertions: useful for debugging, but we all disable them in production, when the program is not in the hands of a developer but of the customer, since for a customer a system that crashes completely is worse than a system that has some bugs somewhere.

replies(2): >>jjnoak+Z4 >>swingl+mF
3. skybri+h4[view] [source] 2022-10-02 16:14:01
>>swingl+(OP)
Yes, aborting an operation is usually better assuming you have some mechanism to do it safely. In the Linux kernel, apparently you often don't?

Although, often in embedded programming, a watchdog that resets the board can be the right thing to do. (As long as you don't get a boot loop.)

◧◩
4. jjnoak+Z4[view] [source] [discussion] 2022-10-02 16:18:00
>>alerig+y1
> for a customer a system that crashes completely is worse than a system that has some bugs somewhere

This entirely depends on the industry and the customer. My team leaves asserts on in production code because our customers want aborts over silent misbehavior.

It is an order of magnitude cheaper for them if things fail loudly and they get a fix when compared to them tracking down quiet issues hours, days, or even months after the fact.

5. evouga+x6[view] [source] 2022-10-02 16:25:33
>>swingl+(OP)
Saving a document is a great example: I would much rather that the kernel corrupt 20% of my unsaved work on a document (with a warning about the corruption), than crash and delete 100% of it.
6. snovv_+E7[view] [source] 2022-10-02 16:30:43
>>swingl+(OP)
It depends if you care more about correctness of this one single component, relative to uptime of the entire system.

A panic caused by the formatting in a rarely used log output taking down all of a large company's NTP servers simultaneously, for example, would not be seen as a reasonable tradeoff.

◧◩
7. swingl+mF[view] [source] [discussion] 2022-10-02 19:53:08
>>alerig+y1
> Depends. Is a kernel panic better than something acting wrongly? I prefer my kernel not to panic, at the expense of some error somewhere that may or may not crash my system.

That's a false dichotomy, you don't get to choose between definitely crashing or maybe crashing. That would be nice but it's not on the menu. Crashing is just the best case scenario, so if you can make your system stop instead of being incorrect, that's great.

> but we all disable them in production (assertions)

We don't all do that.

I concede that it depends on the use case. You might not care if you got a single user non-networked gaming console for example. A bug could even become a welcomed part of the experience there. I hope these cases are more rare than not though.

replies(1): >>alerig+8T
◧◩◪
8. alerig+8T[view] [source] [discussion] 2022-10-02 21:19:49
>>swingl+mF
> That's a false dichotomy, you don't get to choose between definitely crashing or maybe crashing. That would be nice but it's not on the menu. Crashing is just the best case scenario, so if you can make your system stop instead of being incorrect, that's great.

So you prefer a system completely unusable than a system that may be used, but with some errors? If you prefer the first, you will not be able to use practically nothing. If you look at the `dmesg` output of a running Linux system you can find a lot of errors, that even if a single one of them was turned into a panic, your computer would not even be able to boot.

Nothing is perfect, and errors will appear. Ideally errors should be handled at the lowest possible level, but if unhandled to me errors should not result in a complete system crash.

> We don't all do that.

I do that. Reason is that not doing that in my use case would not only render completely unusable the product, but not even upgradable with an over the air firmware update. So better that the system will continue running than it crashing (and then rebooting).

9. yencab+7U[view] [source] 2022-10-02 21:27:52
>>swingl+(OP)
> When you attempt to save changes to your document, would you rather have the corruption of your document due to a bug fail with fanfare or succeed silently?

When your wifi driver crashes yet again, would you choose to discard all unsaved files open in your editor, just on the very unlikely possibility that they're corrupted now?

[go to top]