zlacker

Why is panicing in the kernel on an error not an option? Like kernels can write a core dump and reboot, right?

replies(6): >>dijit+u >>pfortu+P >>zetapo+U >>pca006+o1 >>cillia+U2 >>pyb+S3

>>MarkSw+(OP)
if you panic and you're a kernel you very likely corrupt your filesystem, at the very least.

replies(1): >>layer8+F1

>>MarkSw+(OP)
That is a decision taken by Linus. It might have been different but life is choice. This one has been made while Linus is the boss.

>>MarkSw+(OP)
Yeah ... Just reboot the machine and make me loose all my work, bro.

replies(1): >>charci+b9

>>MarkSw+(OP)
I guess when the kernel panics, there is nothing to write the core dump for you...

replies(2): >>2OEH8e+72 >>detaro+b2

>>dijit+u
While I don’t advocate for kernel panics, journaling filesystems are a thing.

replies(2): >>dijit+62 >>hulitu+Y6

>>layer8+F1
Yes, but even then not all filesystems are journaled.

EFI is FAT, FAT is not journaled. You almost certainly have EFI these days.

replies(2): >>KMnO4+U3 >>layer8+c4

>>pca006+o1
kdump

https://en.wikipedia.org/wiki/Kdump_(Linux)

He also mentions that programs can report problems automatically to the distro devs. For example:

https://retrace.fedoraproject.org/faf/problems/

A kernel dump is not something you always want to upload since it can be large and contain sensitive info. I'm not a kernel dev though.

>>pca006+o1
The kernel crash dump mechanism works by reserving some memory, which it boots a fresh copy of the kernel into on kernel panics, which then takes care of reading the old dead kernel from memory and saving the dump.

Of course this working requires the fresh kernel to be able to get up and do that without itself crashing, so it can't capture every scenario. And it is bringing down the system completely, and there's lots of pros and cons to be argued about that vs attempting to continue or limp along.

replies(1): >>yencab+V81

>>MarkSw+(OP)
From most users’ points of view, a lot of things the kernel does (e.g. a sound card driver) are non-critical so they’d prefer an error in that driver only killed that driver and not the whole kernel. Similarly, I’d be upset if a server rebooted because of a blip in its CD-ROM driver. And if you can just reload the module which errored, all the better.

It would be cool if kernel Rust could implement a panic handler which just killed the offending module, but I’m assuming from the discussion around panics that this isn’t possible.

replies(1): >>vips7L+I7

>>MarkSw+(OP)
No need to reboot the machine without warning, and lose data, when the rest of the kernel is probably still functional.

>>dijit+62
EFI is read, but not frequently written.

replies(1): >>dijit+s4

>>dijit+62
That’s a good point, but EFI isn’t frequently written I believe, so that I would expect that to be a rare circumstance, and even rarer for user data to be affected as a consequence.

>>KMnO4+U3
I'm not sure why that's relevant.

It will be written to on every kernel update and every initramfs update at least, which is what.. once a week on average?

A reply like yours is not so subtly indicating that "it's fine to panic all the time because ultimately you might be fine if you get a panic", which I fundamentally disagree with, other concerns aside.

Also you're suggesting that journaling filesystems are perfect and never lose data, which is also very untrue, in the default case they only protect metadata but there are still circumstances where they can lose data anyway; they're more resilient, not immune.

replies(1): >>wtalli+T7

>>layer8+F1
Journaling FS can also become corrupted. That's why i don't use XFS (just a quick log replay after a kernel crash. Have some crashes and the FS is corrupted beyond repair.)

>>cillia+U2
Wasn’t that the whole point of microkernels/minix vs monoliths? With drivers being in the kernel can you even restart the modules?

replies(1): >>cillia+sB

>>dijit+s4
> It will be written to on every kernel update and every initramfs update at least, which is what.. once a week on average?

Which distros actually use the EFI System Partition that way? I've usually only seen the ESP used to hold the bootloader itself, with kernels and initramfs and the bootloader config pointing to them stored either in a separate /boot partition or in a /boot directory of the / filesystem.

>>zetapo+U
This is why programs automatically saving their state is important.

replies(2): >>elteto+ye >>jmull+6h

>>charci+b9
No, this is why kernels prioritizing not crashing is important. Applications saving their work is a nice extra.

>>charci+b9
That's not a solution to OS instability.

Reliably saving state in the face of sudden total failure is both very tricky and app-specific. Just saving state changes automatically won't do it -- partial writes of complex state are likely to be inconsistent without luck or careful design and QA controls (tests, testing, on-going controls to ensure nothing new operates or relies on anything outside the safe state-saving mechanism).

It makes a lot more sense to put the effort into making the OS continue as well as it can, vs requiring every app to harden itself against sudden total failures.

>>vips7L+I7
With Linux you can unload and reload modules (rmmod, insmod) so it’s a little un-monolithic in that sense.

>>detaro+b2
The mechanism you describe is used/usable only in very specific scenarios.

For practically all non-virtualized Linux hosts out there, the kernel crash dump mechanism works by adding ASCII text to kmesg, which is then read by journald, processed a little, and appended to a file -- which just means submitted back to the kernel for writing, which means FS needs to work, disk I/O needs to work, and so on.