zlacker

[parent] [thread] 5 comments
1. throw8+(OP)[view] [source] 2022-10-02 15:55:13
Had the same topic often on MCUs: limp along to hopefully get the error out somehow, otherwise it won't be noticed if not with JTAG debugger attached (default in field).

So I can understand where Linus comes from.

replies(2): >>gmueck+je >>mlindn+xe
2. gmueck+je[view] [source] 2022-10-02 17:10:32
>>throw8+(OP)
Yes. You could still hard reset after the error is reported if you wanted to. And if system availability matters, a hardware watchdog would handle the case where the error handling doesn't finish.
3. mlindn+xe[view] [source] 2022-10-02 17:11:59
>>throw8+(OP)
Limping along is what the salesman and the business people want as failures look bad.

Engineers should want the immediate stop, because that's safer, especially in safety critical situations.

replies(3): >>wtalli+Di >>warinu+i41 >>niscoc+ZL1
◧◩
4. wtalli+Di[view] [source] [discussion] 2022-10-02 17:33:35
>>mlindn+xe
The kernel is not the whole system. The kernel needs to offer the "limping along" option so that the other parts of the system can implement whatever graceful failure method is appropriate for that system. There's no one size fits all solution for the kernel to pick.
◧◩
5. warinu+i41[view] [source] [discussion] 2022-10-02 22:43:30
>>mlindn+xe
You sound like you code websites or something.

Real engineers, like say the people who code the machines that fly in mars, don't want "oops that's unexpected, ruin the entire mission because that's safer". Same for the Linux kernel.

◧◩
6. niscoc+ZL1[view] [source] [discussion] 2022-10-03 05:47:00
>>mlindn+xe
What are you talking about? Should planes stop flying when they encounter an error?

Safety critical systems will try to recover to a working state as much as possible. It is designed with redundancy that if one path fails, it can use path 2 or path 3 towards a safe usable state.

[go to top]