zlacker

[parent] [thread] 11 comments
1. colord+(OP)[view] [source] 2022-06-21 02:39:11
It at least doesn't lock anything up that has a file open when the network goes down. NFS is a nightmare with that. NFS is more idiomatic on *nix but still a huge pain when dealing with matching file perms across systems.
replies(3): >>dikei+r2 >>Athas+Hh >>js2+ss
2. dikei+r2[view] [source] 2022-06-21 02:56:39
>>colord+(OP)
> It at least doesn't lock anything up that has a file open when the network goes down. NFS is a nightmare with that.

Yeah, we've been bitten by this too, around once a year, even with our fairly reliable and redundant network. It's a PITA, your process just hang and there's no way to even kill it except restarting the server.

replies(2): >>toast0+Bf >>trasz+SF
◧◩
3. toast0+Bf[view] [source] [discussion] 2022-06-21 05:11:39
>>dikei+r2
> It's a PITA, your process just hang and there's no way to even kill it except restarting the server.

If you can bring the missing server back online, the NFS mount should recover.

4. Athas+Hh[view] [source] 2022-06-21 05:36:30
>>colord+(OP)
> It at least doesn't lock anything up that has a file open when the network goes down.

I must admit I feel quite a bit of irrational fury when this happens (similarly, when DNS lookups hang). That some other computer is down should never prevent me from doing, closing, or killing anything on my computer. Make the system call return an error immediately! Remove the process from the process table! Do anything! I can power cycle the computer to get out of it, so clearly a hanging NFS server is not some kind of black hole in our universe from which no escape is possible.

replies(1): >>voxada+Vl
◧◩
5. voxada+Vl[view] [source] [discussion] 2022-06-21 06:17:52
>>Athas+Hh
> I must admit I feel quite a bit of irrational fury when this happens (similarly, when DNS lookups hang).

Neither of those reactions are in anyway irrational. In fact, they're not only perfectly reasonable and understandable but felt by a great many of us here on HN.

6. js2+ss[view] [source] 2022-06-21 06:59:49
>>colord+(OP)
This is not the fault of NFS. The same thing would happen if a local filesystem suddenly went missing. The kernel treats NFS mounts as just another filesystem. You can in fact mount shares as soft or interruptible if you want.

https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storag...

replies(1): >>dikei+6C
◧◩
7. dikei+6C[view] [source] [discussion] 2022-06-21 08:12:28
>>js2+ss
Soft mount can lead to data inconsistency, so it's not always a good choice.
◧◩
8. trasz+SF[view] [source] [discussion] 2022-06-21 08:47:38
>>dikei+r2
This sounds like a Linux client bug (failure to properly implement the “intr” mount option), not the fault of NFS itself.
replies(1): >>grossw+SJ
◧◩◪
9. grossw+SJ[view] [source] [discussion] 2022-06-21 09:31:43
>>trasz+SF
It’s a failure to use the intr mount option. I’ve never had a problem using soft mounts either, which make the described problem non existent
replies(1): >>jabl+0P
◧◩◪◨
10. jabl+0P[view] [source] [discussion] 2022-06-21 10:31:32
>>grossw+SJ
intr/nointr are no-ops in Linux. From the nfs(5) manpage (https://www.man7.org/linux/man-pages/man5/nfs.5.html ):

> intr / nointr This option is provided for backward compatibility. It is ignored after kernel 2.6.25.

(IIRC when that change went in there was also some related changes to more reliably make processes blocked on a hung mount SIGKILL'able)

replies(1): >>smarks+5G1
◧◩◪◨⬒
11. smarks+5G1[view] [source] [discussion] 2022-06-21 16:12:21
>>jabl+0P
This is too bad. The sweet spot was "hard,intr" at least when I was last using NFS on a daily basis (mid 1990s). Hard mounts make sense for programs, which will happily wait indefinitely while blocked in I/O. This worked well for things like doing a build over NFS, which would hang if the server crashed and then pick right up right where it left off when the server rebooted.

Of course this is irritating if you're blocked waiting for something incidental, like your shell doing a search of PATH. In those cases you could just control-C and continue doing what you wanted to do (as long as it didn't actually need that NFS server).

However I can see that it would be difficult to implement interruptibility in various layers of the kernel.

replies(1): >>jabl+W72
◧◩◪◨⬒⬓
12. jabl+W72[view] [source] [discussion] 2022-06-21 18:30:23
>>smarks+5G1
I think the current implementation comes reasonably close to the old "intr" behavior.

AFAICT the problem with "intr" wasn't that the kernel parts were impossible to implement in the kernel, but rather an application correctness issue, as few applications are prepared to handle EINTR in any I/O syscall. However, with "nointr" the process would be blocked in uninterruptible sleep and would be impossible to kill.

However, if the process is about to be killed by the signal, then not handling EINTR is irrelevant. Thus in 2.6.25 a new process state TASK_KILLABLE was introduced (https://lwn.net/Articles/288056/ ), which is a bit like TASK_UNINTERRUPTIBLE except the task can be interrupted by a fatal signal, and the NFS client code was converted to use it in https://lkml.org/lkml/2007/12/6/329 . So the end result is that the process can be killed with Ctrl-C (as long as it hasn't installed a non-default SIGTERM handler), but doesn't need to handle EINTR for all I/O syscalls.

[go to top]