Another way to have thumb-twiddling fun was for all the machines in a building to reboot - power failure, whatever - and then spend a lot of time waiting for each other's cross-mounted NFS shares to come up.
P.S.
> such as AFS (the Andrew File System)
That one prompted a senior engineer, on learning that it was to be deployed to undergrad's lab machines, to let slip "Why inflict it on the hapless livestock?"
> The very notion of a stateless filesystem is ridiculous. Filesystems exist to store state.
It's the protocol that's stateless, not the filesystem. I thought the article made a reasonable attempt to explain that.
Overall the article is reasonable but it omits one of the big issues with NFSv2, which is synchronous writes. Those Sun NFS implementations were based on Sun's RPC system; the server was required not to reply until the write had been committed to stable storage. There was a mount option to disable this, but if you enabled it, it exposed you to data corruption. Certain vendors (SGI, if I recall correctly) at some point claimed their NFS was faster than Sun's, but it implemented asynchronous writes. This resulted in the expected arguments over protocol compliance and reliability vs. performance.
This phenomenon led to various hardware "NFS accelerator" solutions that put an NVRAM write cache in front of the disk in order to speed up synchronous writes. I believe Legato and the still-existing NetApp were based on such technology. Eventually the synchronous writes issue was resolved, possibly by NFSv3, though the details escape me.
I've always just presumed the development of EFS recapitulated the evolution of NFS, in many cases quite literally, considering the EFS protocol is a flavor of NFS. S3 buckets are just blobs with GUIDs in a flat namespace, which is literally what stateless NFS is--every "file" has a persistent UID (GUID if you assume host identifiers are unique), providing a simple handle for submitting idempotent block-oriented read and write operations. Theoretically, EFS could just be a fairly simple interface over S3, especially if you can implicitly wave away many of the caveats (e.g. wrt shared writes) by simply pointing out they have existed and mostly been tolerated in NFS environments for decades.
Yeah, we've been bitten by this too, around once a year, even with our fairly reliable and redundant network. It's a PITA, your process just hang and there's no way to even kill it except restarting the server.
Is there a sane easy way to implement authentication? Last Time I tried iirc my options were LDAP or nil.
Under NFSv4, direct uid/gid is no longer used, but the RPC.idmapd process determines privilege. I'm not really sure how it works beyond continuing to work when uid/gid synchronization is in place for NFSv3 and the connection is upgraded.
There is also an NFS ACL standard, but I don't know anything about it.
NFSv3 and below trusts any uid/gids presented by the client unless they are squashed.
SMB1 was slow - very slow. Novell IPX/SPX was far faster.
SMB2 changed the protocol to include multiple operations in a single packet, but did not introduce encryption (and Microsoft ignored other SMB encryption schemes). It is a LOT faster.
SMB3 finally adds encryption, but only runs in Windows 8 and above.
NFS is a bit messy on the question of encryption, but is a much more open and free set of tools.
It also has some discussion of the indempotent replay cache that is also in the original article.
https://www.kernel.org/doc/ols/2006/ols2006v2-pages-59-72.pd...
However, I did skim the paper, and it seems halfway reasonable, so I suppose I should read the whole thing. Of course nothing is above criticism, and there are many valid criticisms of NFS; but leading with “sucks” is just lazy.
"NFSv4 is a gigantic joke on everyone....NFSv4 is not on our roadmap. It is a ridiculous bloated protocol which they keep adding crap to. In about a decade the people who actually start auditing it are going to see all the mistakes that it hides.
"The design process followed by the NFSv4 team members matches the methodology taken by the IPV6 people. (As in, once a mistake is made, and 4 people are running the test code, it is a fact on the ground and cannot be changed again.) The result is an unrefined piece of trash."
https://blog.fosketts.net/2015/02/03/vsphere-6-nfs-41-finall...
https://docs.oracle.com/en/database/oracle/oracle-database/1...
looooool
(Seriously, though, could someone tell me why this was supposed to make sense?)
Locking has historically always been a problem in NFS. Kirch mentions that NLM was designed for Posix semantics only. I frankly don't know if NLM is related to `rpc.lockd` which appeared in SunOS 4 and possibly even SunOS 3 (mid 1980s at this point) which well predates anything having to do with Posix. Part of the problem is the confused state of file locking in the Unix world, even for local files. There was BSD-style `flock` and SYSV-style `lockf` and there might even have been multiple versions of those. Implementing these in a distributed system would have been terribly complex. Even at Sun, at least through the mid 1990s, the conventional wisdom was to avoid file locking. If you really needed something that supported distributed updates, it was better to use a purpose-built network protocol.
One thing "willy" got right in his comment is that NFS is an example of "worse is better". In its early version, it had the benefit of being relatively simple, as acknowledged in the LWN article. This made it easy to port and reimplement and thus it became widespread.
Of course being simple means there are lots of tradeoffs and shortcomings. To address these you need to make things more complex, and now things are "ridiculous" and "bloated". Oh well.
Notably, OpenBSD has an IPv6 and IPSec (including IKE) stack second to none. If OpenBSD developers actually had a need for the features provided by NFSv4, I'm sure OpenBSD would have an exceptionally polished and refined--at least along the dimensions they care about--implementation. But they don't. What they do have is a relatively well-maintained NFSv3 and YP stacks (not even NIS!), because those things are important to Theo, especially for (AFAIU) maintaining the build farm and related project infrastructure.
I've worked somewhere with a lot of NFS, and they had centralized account management, so everything was fine other than actual security, at least until we hit the limit of 16-bit uids. That place had a different centralized account management for production, so uids weren't consistent between corp and prod, but NFS in prod was very limited. (And you wouldn't nfs between corp and prod either)
I worked somewhere else without real centralized management of accounts on prod, and it was a PITA to bring that back under control, when it started becoming important. Even without intentional use of uids, it's convenient that they all line up on all servers; and it's a pain to change a uid that already exists on the system.
If you can bring the missing server back online, the NFS mount should recover.
I must admit I feel quite a bit of irrational fury when this happens (similarly, when DNS lookups hang). That some other computer is down should never prevent me from doing, closing, or killing anything on my computer. Make the system call return an error immediately! Remove the process from the process table! Do anything! I can power cycle the computer to get out of it, so clearly a hanging NFS server is not some kind of black hole in our universe from which no escape is possible.
The NFS protocol wasn't just stateless, but also securityless!
Stewart, remember the open secret that almost everybody at Sun knew about, in which you could tftp a host's /etc/exports (because tftp was set up by default in a way that left it wide open to anyone from anywhere reading files in /etc) to learn the name of all the servers a host allowed to mount its file system, and then in a root shell simply go "hostname foo ; mount remote:/dir /mnt ; hostname `hostname`" to temporarily change the CLIENT's hostname to the name of a host that the SERVER allowed to mount the directory, then mount it (claiming to be an allowed client), then switch it back?
That's right, the server didn't bother checking the client's IP address against the host name it claimed to be in the NFS mountd request. That's right: the protocol itself let the client tell the server what its host name was, and the server implementation didn't check that against the client's ip address. Nice professional protocol design and implementation, huh?
Yes, that actually worked, because the NFS protocol laughably trusted the CLIENT to identify its host name for security purposes. That level of "trust" was built into the original NFS protocol and implementation from day one, by the geniuses at Sun who originally designed it. The network is the computer is insecure, indeed.
And most engineers at Sun knew that (and many often took advantage of it). NFS security was a running joke, thus the moniker "No File Security". But Sun proudly shipped it to customers anyway, configured with terribly insecure defaults that let anybody on the internet mount your file system. (That "feature" was undocumented, of course.)
While I was a summer intern at Sun in 1987, somebody at Sun laughingly told me about it, explaining that was how everybody at Sun read each other's email. So I tried it out by using that technique to mount remote NFS directories from Rutgers, CMU, and UMD onto my workstation at Sun. It was slow but it worked just fine.
I told my friend Ron Natalie at Rutgers, who was Associate Director of CCIS at the time, that I was able to access his private file systems over the internet from Sun, and he rightfully freaked out, because as a huge Sun customer in charge of security, nobody at Sun had ever told him about how incredibly insecure NFS actually was before, despite all Sun's promises. (Technically I was probably violating the terms of my NDA with Sun by telling him that, but tough cookies.)
For all Sun's lip service about NFS and networks and computers and security, it was widely know internally at Sun that NFS had No File Security, which was why it was such a running inside joke that Sun knowingly shipped it to their customers with such flagrantly terrible defaults, but didn't care to tell anyone who followed their advice and used their software that they were leaving their file systems wide open.
Here is an old news-makers email from Ron from Interop88 that mentions mounting NFS directories over the internet -- by then after I'd told him about NFS's complete lack of security, so he'd probably slightly secured his own servers by overriding the tftp defaults by then, and was able to mount it because he remembered one of the host names in /etc/exports and didn't need to fetch it with tftp to discover it:
>From: Ron Natalie <elbereth.rutgers.edu!ron.rutgers.edu!ron@rutgers.edu> Date: Wed, Oct 5, 1988, 4:09 AM To: NeWS-makers@brillig.umd.edu
>I love a trade show that I can walk into almost any booth and get logged in at reasonable speed to my home machine. One neat experiment was that The Wollongong Group provided a Sun 3/60C for a public mail reading terminal. It was lacking a windowing system, so I decided to see if I could start up NeWS on it. In order to do that, I NFS mounted the /usr partition from a Rutgers machine and Symlinked /usr/NeWS to the appropriate directory. This worked amazingly well.
>(The guys from the Apple booth thought that NeWS was pretty neat, I showed them how to change the menus by just editing the user.ps file.)
>-Ron
I posted about this fact earlier:
https://news.ycombinator.com/item?id=21102724
>DonHopkins on Sept 28, 2019 | parent | context | favorite | on: A developer goes to a DevOps conference
>I love the incredibly vague job title "Member, Technical Staff" I had at Sun. It could cover anything from kernel hacking to HVAC repair!
>At least I had root access to my own workstation (and everybody else's in the company, thanks to the fact that NFS actually stood for No File Security).
>[In the late 80's and early 90's, NFSv2 clients could change their hostname to anything they wanted before doing a mount ("hostname foobar; mount server:/foobar /mnt ; hostname original"), and that name would be sent in the mount request, and the server trusted the name the client claimed to be without checking it against the ip address, then looked it up in /etc/exports, and happily returned a file handle.
>If the NFS server or any of its clients were on your local network, you could snoop file handles by putting your ethernet card into promiscuous mode.
>And of course NFS servers often ran TFTP servers by default (for booting diskless clients), so you could usually read an NFS server's /etc/exports file to find out what client hostnames it allowed, then change your hostname to one of those before mounting any remote file system you wanted from the NFS server.
>And yes, TFTP and NFS and this security hole you could drive the space shuttle through worked just fine over the internet, not just the local area network.]
Sun's track record on network security isn't exactly "stellar" and has "burned" a lot of people (pardon the terrible puns, which can't hold a candle to IBM's "Eclipse" pun). The other gaping security hole at Sun I reported was just after the Robert T Morris Worm incident, as I explained to Martha Zimet:
>Oh yeah, there was that one time I accidentally hacked sun.com’s sendmail server, the day after the Morris worm.
>The worm was getting in via sendmail’s DEBUG command, which was usually enabled by default.
>One of the first helpful responses that somebody emailed around was a suggestion for blocking the worm by editing your sendmail binary, searching for DEBUG, and replacing the D with a NULL character.
>Which the genius running sun.com apparently did.
>That had the effect of disabling the DEBUG command, but enabling the zero-length string command!
>So as I often did, I went “telnet sun.com 25” to EXPN some news-makers email addresses that had been bouncing, and first hit return a couple of times to flush the telnet negotiation characters it sends, so the second return put it in debug mode, and the EXPN returned a whole page full of diagnostic information I wasn’t expecting!
>I reported the problem to postmaster@sun.com and they were like “sorry oops”.
I've mention that one a couple times before:
Neither of those reactions are in anyway irrational. In fact, they're not only perfectly reasonable and understandable but felt by a great many of us here on HN.
NFSv3 "fixed" the write issue by adding a separate COMMIT RPC:
"The write throughput bottleneck caused by the synchronous definition of write in the NFS version 2 protocol has been addressed by adding support so that the NFS server can do unsafe writes. Unsafe writes are writes which have not been committed to stable storage before the operation returns. This specification defines a method for committing these unsafe writes to stable storage in a reliable way."[1]
[0] https://twitter.com/aka_pugs/status/1225665691472166912 [1] https://datatracker.ietf.org/doc/html/rfc1813#section-1.1
Sadly with a Linux NFS server, lock state eventually corrupts itself to extinction but OmniOS can tick along past 300 days uptime without a problem.
Of course.... these issues only show up under production levels of load, and have never been able to be distilled into a reproducable test case.
FML, and FNL ( = fricking NFS locking :P)
I seem to recall checking the link count on the temporary file is all that was needed:
https://github.com/jaysoffian/dotlock
(That code is from 2001 or so.)
[0] https://github.com/freebsd/freebsd-src/blob/main/include/rpc...
Did you find my criticism of how X-Windows sucks in the Unix Haters Handbook as unfair and un-credible and lazy as you found the book's criticism of NFS? Or my criticism of OLWM and XBugTool, which also both sucked?
https://web.archive.org/web/20000423081727/http://www.art.ne...
Did you ever fix those high priority OLWM bugs I reported with XBugTool that OLWM unnecessarily grabbed the X11/NeWS server all the time and caused the input queue to lock up so you couldn't do anything for minutes at a time? And that related bug caused by the same grab problem that the window system would freeze if you pressed the Help key while resizing a window? Or manage to get OLWM's showcase Open Look menus to pin up without disappearing for an instant then reappearing in a different place, with a totally different looking frame around it, and completely different mouse tracking behavior? That unnecessary song and dance completely ruined the "pinned menu" user experience and pinned menu metaphor's illusion that it was the same menu before and after pinning. While TNT menus simply worked and looked perfectly and instantly when you pinned them, because it WAS the same menu after you pinned it, so it didn't have to flicker and change size and location and how it looked and behaved. Ironically, the NeWS Toolkit was MUCH better at managing X11 windows than the OLWM X11 window manager ever was, because our NeWS based X11 window manager "OWM" was deeply customizable and had a lot more advanced features like multiple rooms, scrolling virtual desktops, tabbed windows supporting draggable tabs on all four edges, resize edges, custom resize rubber banding animation, and pie menus, as well. It also never grabbed and froze the window server, and it took a hell of a lot less time and resources to develop than OLWM, which never lifted a finger to support TNT the way TNT bent over backwards to support X11.
NeWS Tab Window Demo -- Demo of the Pie Menu Tab Window Manager for The NeWS Toolkit 2.0. Developed and demonstrated by Don Hopkins:
https://www.youtube.com/watch?v=tMcmQk-q0k4
https://web.archive.org/web/19981203002306/http://www.art.ne...
>I39L window management complicates pinned menus enormously. TNT menus pin correctly, so that when you push the pin in, the menu window simply stays up on the screen, just like you'd expect. This is not the case with XView or even OLWM. Under an I39L window manager, the Open Look pinned menu metaphor completely breaks down. When you pin an X menu, it dissappears from the screen for an instant, then comes back at a different place, at a different size, with a different look and feel. If you're not running just the right window manager, pinned menus don't even have pins! There is no need for such "ICCCM compliant" behavior with TNT menus. When they're pinned, they can just stay there and manage themselves. But were TNT windows managed by an external I39L window manager, they would have to degenerate to the level of X menus.
https://web.archive.org/web/20000602215640/http://www.art.ne...
>I could go on and on, but I just lost my wonderful xbugtool, because I was having too much fun way too fast with those zany scrolling lists, so elmer the bug server freaked out and went off to la-la land, causing xbugtool to lock the windows and start "channeling", at the same time not responding to any events, so when I clicked on the scrolling list, the entire window system froze up and I had to wait for the input queue lock to break, but by the time the lock finally broke (it must have been a kryptonite), xbugtool had made up its mind, decided to meet its maker, finished core dumping, and exited with an astoundingly graceful thrash, which was a totally "dwim" thing for it to do at the time, if you think about it with the right attitude, since I had forgotten what I wanted to file a bug against in the first place anyway, and totally given up the idea of ever using bugtool to file a bug against itself, because bugtool must be perfect to have performed so splendidly!
From the news-makers archive:
>From: Skip Montanaro <crdgw1!montnaro@uunet.uu.net> Date: Feb 16, 1990
>Charles Hedrick writes concerning XNeWS problems. I have a couple of comments on the XNeWS situation.
>The olwm/pswm interface appears (unfortunately) to be stable as far as Sun is concerned. During XNeWS beta testing I complained about the lack of function key support, but was told it was an OpenLook design issue. (NeWS1.1 supported function keys, and you could do it in PostScript if you like.) Sun likes to tout how OpenLook is standard, and was designed by human factors types. As far as I'm concerned, nobody has had enough experience with good user interfaces to sit down and write a (horribly large, hard-to-read) spec from which a window manager with a "good" look-and-feel will be created. I'm convinced you still have to experiment with most user interfaces to get them right.
>As a simple example, consider Don Hopkins' recent tabframes posting. An extra goody added in tabframes is the edge-stretch thingies in the window borders. You can now stretch one edge easily, without inadvertently stretching the other edge connected to your corner-stretch thingie. Why did the OpenLook designers never think of this? SunView had that basic capability, albeit without visible window gadgetry. It wasn't like the idea was completely unheard of.
>I agree that running the XNeWS server with an alternate window manager is a viable option. Before I got my SPARCStation I used XNeWS in X11ONLY mode with gwm, which was the only ICCCM-compliant window manager I had available to me at the time. If you choose to use twm with XNeWS, I recommend you at least try the X11R4 version.
>From: William McSorland - Sun UK - Tech Support <will@willy.uk> Date: May 14, 1991 Subject: 1059370: Please evaluate
>Bug Id: 1059370 Category: x11news Subcategory: olwm Bug/Rfe: rfe Synopsis: OLWM does a Server Grab while the root menu is being displayed. Keywords: select, frame_busy, presses, left, mouse, server, grabbed Severity: 5 Priority: 5 Description:
>Customer inisisted on having this logged as a RFE and so:-
>When bringing up the root menu inside OW2.0 the window manager does a Server Grab hence forcing all its client applications output to be queued by the server, but not displayed.
>The customer recommends that this should be changed to make olwm more friendly.
>Apparently a number of other window managers don't do a server grab while the root menu is being displayed.
>From: Don Hopkins <hopkins@sun.com> Subject: 1059974: Bug report created
>Bug Id: 1059974 Category: x11news Subcategory: server Bug/Rfe: bug Synopsis: I have no mouse motion and my input focus is stuck in xbugtool!!! Keywords: I have no mouth and I must scream [Harlan Ellison] Severity: 1 Priority: 1 Description:
>This is my worst nightmare! None of my TNT or XView applications are getting any mouse motion events, just clicks. And my input focus is stuck in xbugtool, of all places!!! When I click in cmdtool, it gets sucked back into xbugtool when I release the button! And I'm not using click-to-type! I can make selections from menus (tnt, olwm, and xview) if I click them up instead of dragging, but nobody's receiving any mouse motion!
>I just started up a fresh server, ran two jets and a cmdtool, fired up a bugtool from one of the jets (so input focus must have been working then), and after xbugtool had throbbed and grunted around for a while and finally put up its big dumb busy window, I first noticed something was wrong when I could not drag windows around!
>Lucky thing my input focus ended up stuck in xbugtool!
>The scrollbar does not warp my cursor either... I can switch the input focus to any of xbugtool's windows, but I can't ... -- oomph errrgh aaaaahhh! There, yes!
>Aaaaah! What a relief! It stopped! I can move my mouse again!! Hurray!!! It started working when I opened a "jet" window, found I could type into it, so I moved the mouse around, the cursor disappeared, I typed, there were a couple of beeps, I still couldn't find the cursor, so I hit the "Open" key, the jet closed to an icon, and I could type to xbugtool again! And lo and behold now I can type into the cmdtool, too! Just by moving my cursor into it! What a technological wonder! Now I can start filing bug reports against cmdtool, which was the only reason I had the damn thing on my screen in the first place!!! I am amazed at the way the window system seems to read my mind and predict my every move, seeming to carry out elaborate practical jokes to prevent me from filing bugs against it. I had no idea the Open Windows desktop had such sophisticated and well integrated interclient communication!
>From: Don Hopkins <hopkins@sun.com> Subject: 1059976: Bug report created Date: May 21, 1991
>Bug Id: 1059976 Category: x11news Subcategory: olwm Bug/Rfe: bug Synopsis: OLWM menus are inconsistant with the rest of the desktop, Keywords: pinned menus, defaults, tracking, inconsistant look and feel, yet another open look toolkit Severity: 2 Priority: 2 Description:
>You can't set the default of a pinned menu by holding down the control key and clicking over an item.
>Pressing the middle button over the default of a pinned menu erases the default ring.
>You can't set the default of a unpinned menu by pressing the control key then the popping it up by pressing the MENU button on the mouse.
>When you're tracking a menu, and press the control key, the highlighting changes properly, from depressed to undepressed with a default ring, but when you release the control key before releasing the MENU button on the mouse, the highlighting does not change back to depressed without a default ring. Instead it stays the same, then changes to un-depressed without a default ring at the next mouse movement, and you have to move out and back into the menu item to see it depressed again.
>When you're dragging over a menu, then press the control key to set the default, then release the mouse button to make a selection, without releasing the control key, OLWM menus are stuck in default-setting mode, until the next time it sees a control key up transition while it is tracking a menu.
>Clicking the SELECT button on the abbreviated menu button on the upper left corner of the window frame (aka the close box or shine mark) should select the frame menu default, instead of always closing the window.
>The tracking when you press and release the control key over a menu pin is strange, as well. Push-pins as menu defaults are a dubious concept, and the HIT people should be consulted for an explaination or a correction.
>When you press the menu button in a submenu, it does not set the default of the menu that popped up the submenu, the way XView and TNT menus do. This behaviour also needs clarification from the HIT team.
>Pinned OLWM menus do not track the same way as unpinned menus. When you press down over an item, and drag to another item, the highlighting does not follow the mouse, instead the original item stays highlighted. The item un-highlights when the mouse exits it, but the menu highlighting should track the item underneath the cursor when the user is holding the mouse button down, just like they do with a non-pinned menu. The current behavior misleads you that the item would be selected if the button were released even though the cursor is not over the menu item, and is very annoying when you press down over a pinned menu item and miss the one you want, and expect to be able to simply drag over to the one you meant to hit.
>If we are crippling our menus this way on purpose, because we are afraid Apple is going to sue us, then Apple Computer has already done damage to Sun Microsystems without even paying their lawyers to go to court. We licensed the technology directly from Xerox, and we should not make our users suffer with an inferior interface because we are afraid of boogey-men.
>In general, OLWM is yet another OpenLook toolkit, and its menus are unlike any other menus on the desktop. This is a pity because the user interacts so closely with OLWM, and the conspicuous inconsistancy between the window manager and the Open Look applications that it frames leads people to expect inconsistancy and gives the whole system a very unreliable, unpredictable feel.
Other email about OLWM server grabs:
>1056853 x11news/x11: >OW Exit notice hangs system causing input queue lock brokens
>Status: Desktop integration issue. Marked in bug traq as evaluated.
>This is an X-and-NeWS integration issue and is a terribly complicated problem. X11 does not expect that server-internal locking will ever time out. X11 has similar situations to the one mentioned in the bug report where a client grabs the server and then a passive grab triggers. And it works fine. The difference is that the effect of the passive grab doesn't time-out, thereby causing an inconsistent state.
>One possibility cited for a fix is to change the server to stop distributing events to synchronous NeWS interests while an X client has the server grabbed. But this might only result in moving the problem around and not in solving the real problem.
>According to Stuart Marks, olwm could grab the keyboard and mouse before grabbing the server and that might get around this particular problem.
>ACTION: Stuart Marks should be supported in making this change.
Stewart, is any of that lazy unfair criticism?
https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storag...
The kernel on the server would do the work of unlinking the file which might take many seconds. In the meanwhile, the client would timeout, and make another NFS call to unlink. ext2/3 would have removed the path from the visible filesystem namespace, even though unlink hadn't complted, so this second call would return ENOENT. Somewhat confusing to users!
I also had in mind that OpenBSD deliberately and rigorously only refers to "YP" ("Yellow Pee"). Google "OpenBSD" and "NIS" and most of the hits you'll see directly from the OpenBSD project are from commit logs for patches removing accidental usages of "NIS" in initial YP-related feature commits. I'm not quite sure why they do that. I've kind of assumed it's to make clear that they have little interest in addressing vendor compatibility issues, and to emphasize that YP support, such as it is, is narrowly tailored to supporting the needs of the OpenBSD project itself. That's quite different from IPv6, IPSec/IKE, and even NFSv3, where cross-vendor interoperability is a concern (within reason).
I ran a HPC cluster for an University, and relied upon good old NFSv3 for shared file storage(both home directories, and research datasets). In addition I also built out a big set of softwares compiled in one server and made available to the entire cluster via a read-only NFS mount point. The whole thing works so reliably without any hiccups whatsoever. To over some the limitations of authentication and authorisation with NFS storage, we use a centralised FreeIPA server that allows all machines in the cluster have the same UID/GID mapping everywhere.
As a cream on top, the storage we expose over NFS is ZFS, that integrates nicely with NFS.
Update 1: Yes, data security is a bit of an afterthought with NFS. As anybody in my network with physical access can mount my central storage to another server physically and access data as long as they can recreate UID/GID locally.. but, if I let someone to do this physically, I have bigger problems to deal with first.
That's an impressively long time for anyone born and working professionally for Sun Microsystems before The Early Years of NFS to hold the incorrect opinion that NFS doesn't suck. ;) So when smarks makes the provably false claim that NFS doesn't suck, and accuses me of being "lazy" for disagreeing with that, I'm glad I was diligent enough to keep the receipts, and generous enough to share them.
Like much of Unix, it was worse-is-better, and pretty productive for a site. (Well, until there was a problem reaching the NFS server, or until there was a problem with an application license manager that everyone needed.)
I just don't like being called "lazy" for saying "NFS Sucks" by the same guy whose window manager was so lazy it unnecessarily grabbed the X11 server all the time and locked up the window system for minutes at a time, and whose menus flickered and moved and resized and drew and tracked differently when you pinned them, since I've fairly and un-lazily written in great detail about NFS and other Sun security issues numerous times, and un-lazily implemented OPEN LOOK menus and a TNT X11/NeWS window manager that didn't suffer from all those usability problems.
Speaking of lazy menus: Correctly implemented Open Look pinned menus actually had to support two identical hot-synced menus existing and updating at the same time, in case you pinned a menu, then popped the same menu up again from the original location. The TNT menus would lazily create a second popup menu clone only when necessary (when it was already pinned and you popped it up again), and correctly supported tracking and redrawing either menu, setting the menu default item with the control key and programmatically changing other properties by delegating messages to both menus, so it would redraw the default item ring highlighting on both menus when you changed the default, or any other property.
Object oriented programming in The NeWS Toolkit was a lot more like playing with a dynamic Smalltalk interpreter, than pulling teeth with low level X11 programming in C with a compiler and linker plowing through mountains of include files and frameworks, so it was actually interactively FUN, instead of excruciatingly painful, and we could get a lot more work done in the same amount of time than X11 programmers.
Consequently, TNT had a much more thorough and spec-consistent implementation of pinned menus than OLWM, XView, OLIT, or MOOLIT, because NeWS was simply a much better window system that X11, and we were not lazy and didn't choose to selectively ignore or reinterpret the more challenging parts of the Open Look spec, like the other toolkits did because X-Windows and C made life so difficult.
See the comments in the "Clone" method and refs to the "PinnedCopy" instance variable in the PostScript TNT menu source code:
https://donhopkins.com/home/code/menu.ps
% Copy this menu for pinning. Factored out to keep the pinning code
% easier to read. The clone has a few important differences, such as
% no pin or label regardless of the pin/label of the original, but is
% otherwise as close a copy as we can manage.
TNT Open Look Menu design doc:https://donhopkins.com/home/archive/HyperLook/tnt-menu-desig...
The NFS protocol itself didn't disallow slashes in file names, so the NFS server would accept them without question from any client, silently corrupting the file system without any warning. Thanks, NFS!
Oh and here's a great party trick that will totally blow your mind:
On a Mac, use the Finder to create a folder or file whose name is today's date, like "2022/06/21", or anything with a slash in it. Cool, huh? Bet you didn't think you could do that!
Now open a shell and "ls -l" the directory containing the file you just created with slashes in it name. What just happened there?
Now try creating a folder or file whose name is the current time, or anything with colons, like "10:35:43". Ha ha!
Don't worry, it's totally harmless and won't trash your file system or backups like NFS with a Gator Box would.
https://news.ycombinator.com/item?id=20007875
DonHopkins on May 25, 2019 | parent | context | favorite | on: Why Does Windows Really Use Backslash as Path Sepa...
There used to be a bug in the GatorBox Mac Localtalk-to-Ethernet NFS bridge that could somehow trick Unix into putting slashes into file names via NFS, which appeared to work fine, but then down the line Unix "restore" would totally shit itself.
That was because Macs at the time (1991 or so) allowed you to use slashes (and spaces of course, but not colons, which it used a a path separator), and of course those silly Mac people, being touchy feely humans instead of hard core nerds, would dare to name files with dates like "My Spreadsheet 01/02/1991".
https://en.wikipedia.org/wiki/GatorBox
Unix-Haters Handbook
https://archive.org/stream/TheUnixHatersHandbook/ugh_djvu.tx...
Don't Touch That Slash!
UFS allows any character in a filename except for the slash (/) and the ASCII NUL character. (Some versions of Unix allow ASCII characters with the high-bit, bit 8, set. Others don't.)
This feature is great — especially in versions of Unix based on Berkeley's Fast File System, which allows filenames longer than 14 characters. It means that you are free to construct informative, easy-to-understand filenames like these:
1992 Sales Report
Personnel File: Verne, Jules
rt005mfkbgkw0 . cp
Unfortunately, the rest of Unix isn't as tolerant. Of the filenames shown above, only rt005mfkbgkw0.cp will work with the majority of Unix utili- ties (which generally can't tolerate spaces in filenames).
However, don't fret: Unix will let you construct filenames that have control characters or graphics symbols in them. (Some versions will even let you build files that have no name at all.) This can be a great security feature — especially if you have control keys on your keyboard that other people don't have on theirs. That's right: you can literally create files with names that other people can't access. It sort of makes up for the lack of serious security access controls in the rest of Unix.
Recall that Unix does place one hard-and-fast restriction on filenames: they may never, ever contain the magic slash character (/), since the Unix kernel uses the slash to denote subdirectories. To enforce this requirement, the Unix kernel simply will never let you create a filename that has a slash in it. (However, you can have a filename with the 0200 bit set, which does list on some versions of Unix as a slash character.)
Never? Well, hardly ever.
Date: Mon, 8 Jan 90 18:41:57 PST
From: sun!wrs!yuba!steve@decwrl.dec.com (Steve Sekiguchi)
Subject: Info-Mac Digest V8 #3 5
I've got a rather difficult problem here. We've got a Gator Box run-
ning the NFS/AFP conversion. We use this to hook up Macs and
Suns. With the Sun as a AppleShare File server. All of this works
great!
Now here is the problem, Macs are allowed to create files on the Sun/
Unix fileserver with a "/" in the filename. This is great until you try
to restore one of these files from your "dump" tapes, "restore" core
dumps when it runs into a file with a "/" in the filename. As far as I
can tell the "dump" tape is fine.
Does anyone have a suggestion for getting the files off the backup
tape?
Thanks in Advance,
Steven Sekiguchi Wind River Systems
sun!wrs!steve, steve@wrs.com Emeryville CA, 94608
Apparently Sun's circa 1990 NFS server (which runs inside the kernel) assumed that an NFS client would never, ever send a filename that had a slash inside it and thus didn't bother to check for the illegal character. We're surprised that the files got written to the dump tape at all. (Then again, perhaps they didn't. There's really no way to tell for sure, is there now?)The TrueNAS people (ixsystems) have a patch to bring it to Linux and ZFS; though from what I've heard upstream LKML lists aren't too enthused since they'd rather see this being used by an in-kernel filesystem.
All those IP-based ACLs are suddenly useful...
“NLM (the Network Lock Manager). This allows the client to request a byte-range lock on a given file (identified using an NFS file handle), and allows the server to grant it (or not), either immediately or later. Naturally this is an explicitly stateful protocol, as both the client and server must maintain the same list of locks for each client.”*
There is no “the client”, so if clients have to maintain that information, how is it distributed to all clients (including those that will make their first request in the future)? How does the server even know all clients, given the statelessness of the protocol? Or does that locking only work for requests from the same server? Or does the client keep that information only so that it can unlock the ranges when it discovers the process that locked the range exits/crashed? Is it even correct to assume such range locks can’t be created by another process than the one that will delete them (say after the first process forked)?
https://news.ycombinator.com/item?id=25156006
https://en.wikipedia.org/wiki/Jordan_Hubbard#rwall_incident
>rwall incident
>On March 31, 1987 Hubbard executed an rwall command expecting it to send a message to every machine on the network at University of California, Berkeley, where he headed the Distributed Unix Group. The command instead began broadcasting Hubbard's message to every machine on the internet and was stopped after Hubbard realised the message was being broadcast remotely after he received complaints from people at Purdue University and University of Texas. Even though the command was terminated, it resulted in Hubbard receiving 743 messages and complaints, including one from the Inspector General of ARPAnet.
I was logged in on my Sun workstation "tumtum" when it happened, so I received his rwall too, and immediately sent him a humorous email with the subject of "flame flame flame" which I've lost in the intervening 35 years, but I still have a copy of his quick reply:
From: Jordan K. Hubbard <jkh%violet.Berkeley.EDU@berkeley.edu>
Date: Tue, Mar 31, 1987, 11:02 PM
To: Don Hopkins <don@tumtum.cs.umd.edu>
Subject: re: flame flame flame
Thanks, you were nicer than most.. Here's the stock letter I've been
sending back to people:
Thank you, thank you..
Now if I can only figure out why a lowly machine in a basement somewhere
can send broadcast messages to the entire world. Doesn't seem *right*
somehow.
Yours for an annoying network.
Jordan
P.S. I was actually experimenting to see exactly now bad a crock RPC was.
I'm beginning to get an idea. I look forward to your flame.
Jordan
Here's the explanation he sent to hackers_guild, and some replies from old net boys like Milo Medin (who said the program manager of the Arpanet in the Information
Science and Technology Office of DARPA Dennis G. Perry said they would kick UCB off the Arpanet if it ever happened again), Mark Crispin (who presciently proposed cash rewards for discovering and disclosing security bugs), and Dennis G. Perry himself: From: Jordan K. Hubbard <jkh%violet.Berkeley.EDU@berkeley.edu>
Date: April 2, 1987
Subject: My Broadcast
By now, many of you have heard of (or seen) the broadcast message I sent to
the net two days ago. I have since received 743 messages and have
replied to every one (either with a form letter, or more personally
when questions were asked). The intention behind this effort was to
show that I wasn't interested in doing what I did maliciously or in
hiding out afterwards and avoiding the repercussions. One of the
people who received my message was Dennis Perry, the Inspector General
of the ARPAnet (in the Pentagon), and he wasn't exactly pleased.
(I hear his Interleaf windows got scribbled on)
So now everyone is asking: "Who is this Jordan Hubbard, and why is he on my
screen??"
I will attempt to explain.
I head a small group here at Berkeley called the "Distributed Unix Group".
What that essentially means is that I come up with Unix distribution software
for workstations on campus. Part of this job entails seeing where some of
the novice administrators we're creating will hang themselves, and hopefully
prevent them from doing so. Yesterday, I finally got around to looking
at the "broadcast" group in /etc/netgroup which was set to "(,,)". It
was obvious that this was set up for rwall to use, so I read the documentation
on "netgroup" and "rwall". A section of the netgroup man page said:
...
Any of three fields can be empty, in which case it signifies
a wild card. Thus
universal (,,)
defines a group to which everyone belongs. Field names that ...
...
Now "everyone" here is pretty ambiguous. Reading a bit further down, one
sees discussion on yellow-pages domains and might be led to believe that
"everyone" was everyone in your domain. I know that rwall uses point-to-point
RPC connections, so I didn't feel that this was what they meant, just that
it seemed to be the implication.
Reading the rwall man page turned up nothing about "broadcasts". It doesn't
even specify the communications method used. One might infer that rwall
did indeed use actual broadcast packets.
Failing to find anything that might suggest that rwall would do anything
nasty beyond the bounds of the current domain (or at least up to the IMP),
I tried it. I knew that rwall takes awhile to do its stuff, so I left
it running and went back to my office. I assumed that anyone who got my
message would let me know.. Boy, was I right about that!
After the first few mail messages arrived from Purdue and Utexas, I begin
to understand what was really going on and killed the rwall. I mean, how
often do you expect to run something on your machine and have people
from Wisconsin start getting the results of it on their screens?
All of this has raised some interesting points and problems.
1. Rwall will walk through your entire hosts file and blare at anyone
and everyone if you use the (,,) wildcard group. Whether this is a bug
or a feature, I don't know.
2. Since rwall is an RPC service, and RPC doesn't seem to give a damn
who you are as long as you're root (which is trivial to be, on a work-
station), I have to wonder what other RPC services are open holes. We've
managed to do some interesting, unauthorized, things with the YP service
here at Berkeley, I wonder what the implications of this are.
3. Having a group called "broadcast" in your netgroup file (which is how
it comes from sun) is just begging for some novice admin (or operator
with root) to use it in the mistaken belief that he/she is getting to
all the users. I am really surprised (as are many others) that this has
taken this long to happen.
4. Killing rwall is not going to solve the problem. Any fool can write
rwall, and just about any fool can get root priviledge on a Sun workstation.
It seems that the place to fix the problem is on the receiving ends. The
only other alternative would be to tighten up all the IMP gateways to
forward packets only from "trusted" hosts. I don't like that at all,
from a standpoint of reduced convenience and productivity. Also, since
many places are adding hosts at a phenominal rate (ourselves especially),
it would be hard to keep such a database up to date. Many perfectly well-
behaved people would suffer for the potential sins of a few.
I certainly don't intend to do this again, but I'm very curious as to
what will happen as a result. A lot of people got wall'd, and I would think
that they would be annoyed that their machine would let someone from the
opposite side of the continent do such a thing!
Jordan Hubbard
jkh@violet.berkeley.edu (ucbvax!jkh)
Computer Facilities & Communications.
U.C. Berkeley
From: Milo S. Medin <medin@orion.arpa>
Date: Apr 6, 1987, 5:06 AM
Actually, Dennis Perry is the head of DARPA/IPTO, not a pencil pusher
in the IG's office. IPTO is the part of DARPA that deals with all
CS issues (including funding for ARPANET, BSD, MACH, SDINET, etc...).
Calling him part of the IG's office on the TCP/IP list probably didn't
win you any favors. Coincidentally I was at a meeting at the Pentagon
last Thursday that Dennis was at, along with Mike Corrigan (the man
at DoD/OSD responsible for all of DDN), and a couple other such types
discussing Internet management issues, when your little incident
came up. Dennis was absolutely livid, and I recall him saying something
about shutting off UCB's PSN ports if this happened again. There were
also reports about the DCA management types really putting on the heat
about turning on Mailbridge filtering now and not after the buttergates
are deployed. I don't know if Mike St. Johns and company can hold them
off much longer. Sigh... Mike Corrigan mentioned that this was the sort
of thing that gets networks shut off. You really pissed off the wrong
people with this move!
Dennis also called up some VP at SUN and demanded this hole
be patched in the next release. People generally pay attention
to such people.
Milo
From: Mark Crispin <MRC%PANDA@sumex-aim.stanford.edu>
Date: Mon, Apr 6, 1987, 10:15 AM
Dan -
I'm afraid you (and I, and any of the other old-timers who
care about security) are banging your head against a brick wall.
The philsophy behind Unix largely seems quite reminiscent of the
old ITS philsophy of "security through obscurity;" we must
entrust our systems and data to a open-ended set of youthful
hackers (the current term is "gurus") who have mastered the
arcane knowledge.
The problem is further exacerbated by the multitude of slimy
vendors who sell Unix boxes without sources and without an
efficient means of dealing with security problems as they
develop.
I don't see any relief, however. There are a lot of
politics involved here. Some individuals would rather muzzle
knowledge of Unix security problems and their fixes than see them
fixed. I feel it is *criminal* to have this attitude on the DDN,
since our national security in wartime might ultimately depend
upon it. If there is such a breach, those individuals will be
better off if the Russians win the war, because if not there will
be a Court of Inquiry to answer...
It may be necessary to take matters into our own hands, as
you did once before. I am seriously considering offering a cash
reward for the first discoverer of a Unix security bug, provided
that the bug is thoroughly documented (with both cause and fix).
There would be a sliding cash scale based on how devastating the
bug is and how many vendors' systems it affects. My intention
would be to propagate the knowledge as widely as possible with
the express intension of getting these bugs FIXED everywhere.
Knowledge is power, and it properly belongs in the hands of
system administrators and system programmers. It should NOT be
the exclusive province of "gurus" who have a vested interest in
keeping such details secret.
-- Mark --
PS: Crispin's definition of a "somewhat secure operating system":
A "somewhat secure operating system" is one that, given an
intelligent system management that does not commit a blunder that
compromises security, would withstand an attack by one of its
architects for at least an hour.
Crispin's definition of a "moderately secure operating system": a
"moderately secure operating system" is one that would withstand
an attack by one of its architects for at least an hour even if
the management of the system are total idiots who make every
mistake in the book.
-------
From: Dennis G. Perry <PERRY@vax.darpa.mil>
Date: Apr 6, 1987, 3:19 PM
Jordan, you are right in your assumptions that people will get annoyed
that what happened was allowed to happen.
By the way, I am the program manager of the Arpanet in the Information
Science and Technology Office of DARPA, located in Roslin (Arlington), not
the Pentagon.
I would like suggestions as to what you, or anyone else, think should be
done to prevent such occurances in the furture. There are many drastic
choices one could make. Is there a reasonable one? Perhaps some one
from Sun could volunteer what there action will be in light of this
revelation. I certainly hope that the community can come up with a good
solution, because I know that when the problem gets solved from the top
the solutions will reflect their concerns.
Think about this situation and I think you will all agree that this is
a serious problem that could cripple the Arpanet and anyother net that
lets things like this happen without control.
dennis
-------
Also:> To provide a greater degree of compatibility with NFSv3, which identified users and groups by 32-bit unsigned user identifiers and group identifiers, owner and group strings that consist of ASCII- encoded decimal numeric values with no leading zeros can be given a special interpretation by clients and servers that choose to provide such support. The receiver may treat such a user or group string as representing the same user as would be represented by an NFSv3 uid or gid having the corresponding numeric value.
I'm not sure how common this extension is, at least the Linux server and client support it out of the box. Isilon also supports it, but it must be explicitly enabled.
https://web.archive.org/web/19980130085039/http://www.kernel...
S3 and EFS actually are quite different. Files on EFS are update-able, rename-able and link-able (I.e what’s expected from a file system), while S3 objects are immutable once they are created. This comes from the underlying data structures. EFS uses inodes and directories while S3 is more of a flat map.
Protocol-wise EFS uses standard NFS 4.1. We added some optional innovations outside the protocol that you can use through our mount helper (mount.efs). This includes in-transit encryption with TLS (you can basically talk TLS to our endpoint and we will detect that automatically), and we support strong client auth using SigV4 over x509 client certificate.
> intr / nointr This option is provided for backward compatibility. It is ignored after kernel 2.6.25.
(IIRC when that change went in there was also some related changes to more reliably make processes blocked on a hung mount SIGKILL'able)
It is very common. I’m not aware of a v4 server that does not support this.
Originally v4 went all in on Kerberos (GSSAPI technically) to provide strong multi user auth. This is the reason that users and groups are represented as strings.
This approach works reasonably well on Windows with SMB since you have AD there giving you Kerberos and a shared (LDAP) directory. AD is also deeply integrated in the OS for things like credential management and provisioning.
The approach did not work so well on Linux where not everyone is running AD or even some kind of directory. This caused the protocol designers to make Kerberos optional in v4.1. I guess the spec authors already knew that Kerberos was going to be difficult because I just checked and the numerical-strong-user-as-posix-id workaround was already present in the original 4.0 spec.
My favorite criticism from that paper is that NFS clients reused the source port so that the server can detect whether a new connection is the same client or not. This confuses stateful packet filtering on the network because both connections now have the same 5-tuple and packets on the new connection can look like out of window packets on the old connection. This can get connections blackholed depending on the network. This was fixed a few years ago in the Linux client for NFS v4.1, since that version of the protocol already has a different way to identify clients. Before this was fixed, EFS had to document a workaround.
Will EFS be updated to use the NFS-TLS RFC once it settles down some?
* https://datatracker.ietf.org/doc/html/draft-ietf-nfsv4-rpc-t...
https://www.delltechnologies.com/asset/en-us/products/storag...
Used in lots of places if they don't want to go GPFS, Lustre, maybe CephFS nowadays. Dell-EMC Isilon is used in lots of places for NFS (and SMB): I worked at a place that had >10PB in one file system/namespace (each node both serves traffic and has disk/flash, replicated over a back-end).
> […] we use a centralised FreeIPA server that allows all machines in the cluster have the same UID/GID mapping everywhere.
(Open)LDAP is still very handy as well and used in many places. (AD is technically LDAP+Kerberos.)
"SMB1 is an extremely chatty protocol, which is not such an issue on a local area network (LAN) with low latency. It becomes very slow on wide area networks (WAN) as the back and forth handshake of the protocol magnifies the inherent high latency of such a network. Later versions of the protocol reduced the high number of handshake exchanges."
I can't commit on a public forum for obvious reasons but we'll definitely take a serious look at this, especially when the Linux client starts supporting this. We did consult with the authors of that draft RFC earlier and it should be relatively easy for us to adopt this.
* https://patchwork.kernel.org/project/cifs-client/cover/16503...
* https://www.freshports.org/sysutils/nfs-over-tls/
Activity on the NFSv4 mailing list:
* https://mailarchive.ietf.org/arch/browse/nfsv4/
But no recent commits to the draft:
* https://github.com/chucklever/i-d-rpc-tls
¯\_(ツ)_/¯
* https://mailarchive.ietf.org/arch/browse/nfsv4/
But no recent commits to the draft:
I don't know, I hope it will.
Not to go on too much of a tangent, and at the risk of sounding like my employer's fanboy, but one of the great things about working at AWS (I'm being honest, and yes we are hiring SDEs and PMs) is that we 100% focus on the customer. When our customers told us they needed encryption in transit, we figured out we could simply offer them transport-level TLS independent from the application-level RPC protocol. It may not have been the standards-compliant approach, but our customers have been enjoying fast reliable encryption for over 4 years now [1]. It solves a real problem because customers have compliance requirements.
[1] https://aws.amazon.com/about-aws/whats-new/2018/04/amazon-ef...
Everything gets squashed in NFSv4 until idmapd is configured on both the client and the server, and they are set to the same "domain" (by default everything in the FQDN except for the simple host name).
Assuming this is up, everything will be unsquashed, and it will behave like NFSv3.
Here is the status:
"This one had to be paused for a bit to work out some issues around using a wider type to hold the epoch value, to accomodate some DTLS-SCTP use cases involving associations expected to remain up for years at a time. https://github.com/tlswg/dtls13-spec/issues/249 ends up covering most of the topics, though the discussion is a bit jumbled. We have a proposed solution with almost all the signoffs needed, and should be attempting to confirm this approach at the session at IETF 112 next week...
"I'm sorry that these have been taking so long; these delays were unexpected."
Technically the server doesn't need to have a UID/GID database that's aligned with the client, what's required is that all clients of the same server are aligned. The server will take the numerical UID/GIDs from the RPC sent by the client and perform Posix style permission checks using the owner UID, owner GID, and mode bits stored in the inode of the file or directory. The server doesn't need to known what user the UID corresponds to.
Out of curiosity, did you ever try Kereberized NFS for extra security? We tested it out a while back (and still use it in some small circumstances) but never got it stable enough for production use.
Side-note: I wouldn't be surprised if LDAP+NFS is still pretty common across universities, either as a holdover from Sun days or just out of practicality.
It's not just about reads and writes - doing a stateless protocol for a block device is fine. Think more about unlink...
I wonder if it would maintain a speed advantage today.
"NetWare dominated the network operating system (NOS) market from the mid-1980s through the mid- to late-1990s due to its extremely high performance relative to other NOS technologies. Most benchmarks during this period demonstrated a 5:1 to 10:1 performance advantage over products from Microsoft, Banyan, and others. One noteworthy benchmark pitted NetWare 3.x running NFS services over TCP/IP (not NetWare's native IPX protocol) against a dedicated Auspex NFS server and an SCO Unix server running NFS service. NetWare NFS outperformed both 'native' NFS systems and claimed a 2:1 performance advantage over SCO Unix NFS on the same hardware."
Of course this is irritating if you're blocked waiting for something incidental, like your shell doing a search of PATH. In those cases you could just control-C and continue doing what you wanted to do (as long as it didn't actually need that NFS server).
However I can see that it would be difficult to implement interruptibility in various layers of the kernel.
This was maintained using YP/NIS. But Sun was too big for a single YP/NIS domain, so there was a hack where each YP/NIS master was populated via some kind of uber-master database. At least at one point, this consisted of plain text files on a filesystem that was NFS-mounted by every YP/NIS master....
This was all terribly insecure. Since everybody had root on their own workstations, you could `su root` and then `su somebody` to get processes running with their UID, and then you could read and write all their files over NFS. But remember, this was back in the day when we sent passwords around in the clear, we used insecure tools like telnet and ftp and BSD tools like rsh/rcp/rlogin. So NFS was "no more insecure" than anything else running on the network. But that was ok, because everything was behind a firewall. (Some sarcasm in those last bits, in case it wasn't obvious.)
AFAICT the problem with "intr" wasn't that the kernel parts were impossible to implement in the kernel, but rather an application correctness issue, as few applications are prepared to handle EINTR in any I/O syscall. However, with "nointr" the process would be blocked in uninterruptible sleep and would be impossible to kill.
However, if the process is about to be killed by the signal, then not handling EINTR is irrelevant. Thus in 2.6.25 a new process state TASK_KILLABLE was introduced (https://lwn.net/Articles/288056/ ), which is a bit like TASK_UNINTERRUPTIBLE except the task can be interrupted by a fatal signal, and the NFS client code was converted to use it in https://lkml.org/lkml/2007/12/6/329 . So the end result is that the process can be killed with Ctrl-C (as long as it hasn't installed a non-default SIGTERM handler), but doesn't need to handle EINTR for all I/O syscalls.
> Side-note: I wouldn't be surprised if LDAP+NFS is still pretty common across universities, either as a holdover from Sun days or just out of practicality.
Yes, absolutely. Most large enterprises, be it universities or big companies, have some kind of centralized directory (nowadays probably Microsoft AD), and machines (servers and end user clients) are then configured to lookup user and group info from there.
Last I checked it spawns a HAProxy on the client and points the in-kernel NFS client to this HAProxy on lo, is this still the case?
And, out of curiosity: now that EFS claims 600us average read latency, would the extra hop matter?
https://ask.wireshark.org/users/12/guy-harris/
https://www.wireshark.org/docs/wsug_html_chunked/ChIntroHist...
Think about the environment it was originally used in — large organizations, computers which cost as much as a car, LANs which aren't easily accessible (e.g. the Unix people have access but laptops are expensive oddity and the sales people are probably sitting in front of a DOS box or shelled into that Unix server), etc. It's more defensible when your unix administrator is going to configure each of the servers to use the same NIS user directory.
All of that broke down when IP networking became the default, every desk in the building had a network port, and things like WiFi and laptops completely blew away the idea that the clients were managed by a single administrative group.
Note that I'm not arguing that Sun was a leader in security, but they did make some efforts that other companies didn't.
Back in the day of 1 MHz machines, with 3 Mbps net, having your home dir on a network file system was a tad hopeful - or hopeless ...
Funnily enough I've been writing a pure-Smalltalk X11 protocol implementation recently, for Squeak, and it starts to have some of the feel you describe. It generates Smalltalk code from the XML xcbproto definitions. It's at the point now where you can send X11 requests interactively in a Workspace, etc., which is fun ("playing with a dynamic Smalltalk interpreter"), and I'm working on integrating it with Morphic. Anyway, thought you might enjoy the idea.