zlacker

[return to "“Rust is safe” is not some kind of absolute guarantee of code safety"]
1. jmilli+Fb[view] [source] 2022-10-02 15:34:06
>>rvz+(OP)
As usual HN comments react to the headline, without reading the content.

A lot of modern userspace code, including Rust code in the standard library, thinks that invariant failures (AKA "programmer errors") should cause some sort of assertion failure or crash (Rust or Go `panic`, C/C++ `assert`, etc). In the kernel, claims Linus, failing loudly is worse than trying to keep going because failing would also kill the failure reporting mechanisms.

He advocates for a sort of soft-failure, where the code tells you you're entering unknown territory and then goes ahead and does whatever. Maybe it crashes later, maybe it returns the wrong answer, who knows, the only thing it won't do is halt the kernel at the point the error was detected.

Think of the following Rust API for an array, which needs to be able to handle the case of a user reading an index outside its bounds:

  struct Array<T> { ... }
  impl<T> Array<T> {
    fn len(&self) -> usize;

    // if idx >= len, panic
    fn get_or_panic(&self, idx: usize) -> T;

    // if idx >= len, return None
    fn get_or_none(&self, idx: usize) -> Option<T>;

    // if idx >= len, print a stack trace and return
    // who knows what
    unsafe fn get_or_undefined(&self, idx: usize) -> T;
  }
The first two are safe by the Rust definition, because they can't cause memory-unsafe behavior. The second two are safe by the Linus/Linux definition, because they won't cause a kernel panic. If you have to choose between #1 and #3, Linus is putting his foot down and saying that the kernel's answer is #3.
◧◩
2. EdScho+Vf[view] [source] 2022-10-02 15:58:40
>>jmilli+Fb
The policy of ‘oopsing’ and limping on is, in my opinion, literally one of Linux’s worst features. It has bitten me in various cases:

- Remember when Linux had that caused the kernel to partially crash and eat 100% CPU due to some bug in the leap second application code? That caused a >1MW spike in power usage at Hetzner at the time. That must have been >1GW globally. Many people didn’t notice it immediately, so it must have taken weeks before everyone rebooted.

- I’ve personally run into issues where not crashing caused Linux to go on and eat my file system.

On any Linux server I maintain, I always toggle those sysctls that cause the kernel to panic on oops, and reboot on panic.

◧◩◪
3. mike_h+7k[view] [source] 2022-10-02 16:21:30
>>EdScho+Vf
So instead of a power spike, we'd have had a major internet outage across the world, across the entire industry and beyond, probably, if everyone had panicked on oops. The blame really lies with people not monitoring their systems.

As you said, you have the option to reboot on panic, but Linus is absolutely not wrong that this size does not fit all.

What about a medical procedure that WILL kill the patient if interrupted? What about life support in space? Hitting an assert in those kinds of systems is a very bad place to be, but an automatic halt is worse than at least giving the people involved a CHANCE to try and get to a state where it's safe to take the system offline and restart it.

◧◩◪◨
4. ok_dad+8n[view] [source] 2022-10-02 16:35:12
>>mike_h+7k
> What about a medical procedure that WILL kill the patient if interrupted? What about life support in space? Hitting an assert in those kinds of systems is a very bad place to be, but an automatic halt is worse than at least giving the people involved a CHANCE to try and get to a state where it's safe to take the system offline and restart it.

Kinda a strawman there. That's got to account for, what, 0.0001% of all use of computers, and probably they would never ever use Linux for these applications (I know medical devices DO NOT use Linux).

◧◩◪◨⬒
5. gmueck+xo[view] [source] 2022-10-02 16:41:55
>>ok_dad+8n
Do you know absolutely every medical device in existence and do you know how broad the definition of a medical device is (including e.g. the monitor attached to the PC used for displaying X-ray images)?
◧◩◪◨⬒⬓
6. jmilli+qp[view] [source] 2022-10-02 16:46:22
>>gmueck+xo

  > including e.g. the monitor attached to the PC used for displaying
  > X-ray images
Somewhat off-topic, but I used to work in a dental office. The monitors used for displaying X-rays were just normal monitors, bought from Amazon or Newegg or whatever big-box store had a clearance sale. Even the X-ray sensors themselves were (IIRC) not regulated devices, you could buy one right now on AliExpress if you wanted to.
◧◩◪◨⬒⬓⬔
7. gmueck+7C[view] [source] 2022-10-02 17:55:16
>>jmilli+qp
That's not the case in the EU. I've worked for an equipment manufacturer for dental clinics. While the monitors were allowed to be off the shelf, the operator (dental clinic) is required to make sure that they work properly and display the image correctly - obey certain brightness and color resolution/calibration standards. Our display software had to refuse to work on an uncalibrated monitor.
◧◩◪◨⬒⬓⬔⧯
8. alias_+t51[view] [source] 2022-10-02 21:05:54
>>gmueck+7C
Interesting, how does your software detect an uncalibrated monitor? Did it come with a calibration device which had to be used to scan the display output to check?

I don't suppose monitors report calibration data back to display adapters do they?

◧◩◪◨⬒⬓⬔⧯▣
9. Gauntl+N61[view] [source] 2022-10-02 21:15:16
>>alias_+t51
My guess is they had some heuristic based on EDIDs, which are incredibly easy to spoof.

https://smile.amazon.com/EVanlak-Passthrough-Generrtion-Elim...

◧◩◪◨⬒⬓⬔⧯▣▦
10. gmueck+B91[view] [source] 2022-10-02 21:34:28
>>Gauntl+N61
Yes, but why would you go to these lengths? The purpose of the whole mechanism is to prevent accidental misdiagnosis based on an incorrectly interpreted X-ray image. This isn't DRM, just a safeguard against incorrect use of equipment.
◧◩◪◨⬒⬓⬔⧯▣▦▧
11. Gauntl+YT1[view] [source] 2022-10-03 04:12:04
>>gmueck+B91
People are cheap and corrupt. The speed bump this presents is real, but minor, in the face of a couple medical shops looking to save $100/pop on a dozen monitors.

I hope it's rare, but I think a persistent nag window ("Your display isn't calibrated and may not be accurate") is probably a better answer than refusing to work altogether, because it will be clear about the source of the problem and less likely to get nailed down.

◧◩◪◨⬒⬓⬔⧯▣▦▧▨
12. kaba0+DM2[view] [source] 2022-10-03 12:52:49
>>Gauntl+YT1
Medical devices are insanely expensive (a CT scanner may reach a million dollars), you won’t risk $100 on such a small thing as a screen.
[go to top]