zlacker

Of course, the first commenter "willy" repeats the canard that statelessness makes no sense:

> The very notion of a stateless filesystem is ridiculous. Filesystems exist to store state.

It's the protocol that's stateless, not the filesystem. I thought the article made a reasonable attempt to explain that.

Overall the article is reasonable but it omits one of the big issues with NFSv2, which is synchronous writes. Those Sun NFS implementations were based on Sun's RPC system; the server was required not to reply until the write had been committed to stable storage. There was a mount option to disable this, but if you enabled it, it exposed you to data corruption. Certain vendors (SGI, if I recall correctly) at some point claimed their NFS was faster than Sun's, but it implemented asynchronous writes. This resulted in the expected arguments over protocol compliance and reliability vs. performance.

This phenomenon led to various hardware "NFS accelerator" solutions that put an NVRAM write cache in front of the disk in order to speed up synchronous writes. I believe Legato and the still-existing NetApp were based on such technology. Eventually the synchronous writes issue was resolved, possibly by NFSv3, though the details escape me.

replies(5): >>wahern+Qf >>chasil+Ok >>DonHop+0B >>mprovo+AB >>kovers+kG1

>>smarks+(OP)
NFS is basically the original S3. Both are useful for similar scenarios (maybe a slightly narrow subset for NFS (especially in later incarnations), and the semantics of both break down in similar ways.

I've always just presumed the development of EFS recapitulated the evolution of NFS, in many cases quite literally, considering the EFS protocol is a flavor of NFS. S3 buckets are just blobs with GUIDs in a flat namespace, which is literally what stateless NFS is--every "file" has a persistent UID (GUID if you assume host identifiers are unique), providing a simple handle for submitting idempotent block-oriented read and write operations. Theoretically, EFS could just be a fairly simple interface over S3, especially if you can implicitly wave away many of the caveats (e.g. wrt shared writes) by simply pointing out they have existed and mostly been tolerated in NFS environments for decades.

replies(1): >>geertj+t31

>>smarks+(OP)
There is a historical document by Olaf Kirch that addresses many aspects of the stateless design.

It also has some discussion of the indempotent replay cache that is also in the original article.

https://www.kernel.org/doc/ols/2006/ols2006v2-pages-59-72.pd...

replies(1): >>smarks+2m

>>chasil+Ok
“Why NFS Sucks” (2006), picking on a protocol that was over 20 years old at that point. Also cites “The Unix-Haters Handbook” in the abstract. Two strikes against its credibility already.

However, I did skim the paper, and it seems halfway reasonable, so I suppose I should read the whole thing. Of course nothing is above criticism, and there are many valid criticisms of NFS; but leading with “sucks” is just lazy.

replies(3): >>chasil+im >>DonHop+fH >>geertj+s71

>>smarks+2m
If you think that was bad, just listen to what Theo de Raadt had to say.

"NFSv4 is a gigantic joke on everyone....NFSv4 is not on our roadmap. It is a ridiculous bloated protocol which they keep adding crap to. In about a decade the people who actually start auditing it are going to see all the mistakes that it hides.

"The design process followed by the NFSv4 team members matches the methodology taken by the IPV6 people. (As in, once a mistake is made, and 4 people are running the test code, it is a fact on the ground and cannot be changed again.) The result is an unrefined piece of trash."

https://blog.fosketts.net/2015/02/03/vsphere-6-nfs-41-finall...

replies(2): >>smarks+Xt >>wahern+0u

>>chasil+im
OK, I read the Olaf Kirch article, and the "NFS Sucks" title is mostly clickbait. There are indeed a bunch of shortcomings in NFS that he points out, that are partially addressed by NFSv4. He also admits that (as of 2006) there isn't anything better.

Locking has historically always been a problem in NFS. Kirch mentions that NLM was designed for Posix semantics only. I frankly don't know if NLM is related to `rpc.lockd` which appeared in SunOS 4 and possibly even SunOS 3 (mid 1980s at this point) which well predates anything having to do with Posix. Part of the problem is the confused state of file locking in the Unix world, even for local files. There was BSD-style `flock` and SYSV-style `lockf` and there might even have been multiple versions of those. Implementing these in a distributed system would have been terribly complex. Even at Sun, at least through the mid 1990s, the conventional wisdom was to avoid file locking. If you really needed something that supported distributed updates, it was better to use a purpose-built network protocol.

One thing "willy" got right in his comment is that NFS is an example of "worse is better". In its early version, it had the benefit of being relatively simple, as acknowledged in the LWN article. This made it easy to port and reimplement and thus it became widespread.

Of course being simple means there are lots of tradeoffs and shortcomings. To address these you need to make things more complex, and now things are "ridiculous" and "bloated". Oh well.

replies(1): >>Enderb+8D

>>chasil+im
Trash or not, the demand for the features is there. OpenBSD enjoys the luxury of simply telling people who need more sophisticated features to piss-off, at least until the time a protocol or interface has been hashed out and settled into a static target.

Notably, OpenBSD has an IPv6 and IPSec (including IKE) stack second to none. If OpenBSD developers actually had a need for the features provided by NFSv4, I'm sure OpenBSD would have an exceptionally polished and refined--at least along the dimensions they care about--implementation. But they don't. What they do have is a relatively well-maintained NFSv3 and YP stacks (not even NIS!), because those things are important to Theo, especially for (AFAIU) maintaining the build farm and related project infrastructure.

replies(1): >>bgm197+Mx

>>wahern+0u
Yp is NIS. It was renamed by Sun due to the trademark on Yellow Pages. Maybe you’re thinking of NIS+ (which was an abomination). TBH, they are both horrible for their own reasons.

replies(1): >>wahern+0K

>>smarks+(OP)
NFS originally stood for "No File Security".

The NFS protocol wasn't just stateless, but also securityless!

Stewart, remember the open secret that almost everybody at Sun knew about, in which you could tftp a host's /etc/exports (because tftp was set up by default in a way that left it wide open to anyone from anywhere reading files in /etc) to learn the name of all the servers a host allowed to mount its file system, and then in a root shell simply go "hostname foo ; mount remote:/dir /mnt ; hostname `hostname`" to temporarily change the CLIENT's hostname to the name of a host that the SERVER allowed to mount the directory, then mount it (claiming to be an allowed client), then switch it back?

That's right, the server didn't bother checking the client's IP address against the host name it claimed to be in the NFS mountd request. That's right: the protocol itself let the client tell the server what its host name was, and the server implementation didn't check that against the client's ip address. Nice professional protocol design and implementation, huh?

Yes, that actually worked, because the NFS protocol laughably trusted the CLIENT to identify its host name for security purposes. That level of "trust" was built into the original NFS protocol and implementation from day one, by the geniuses at Sun who originally designed it. The network is the computer is insecure, indeed.

And most engineers at Sun knew that (and many often took advantage of it). NFS security was a running joke, thus the moniker "No File Security". But Sun proudly shipped it to customers anyway, configured with terribly insecure defaults that let anybody on the internet mount your file system. (That "feature" was undocumented, of course.)

While I was a summer intern at Sun in 1987, somebody at Sun laughingly told me about it, explaining that was how everybody at Sun read each other's email. So I tried it out by using that technique to mount remote NFS directories from Rutgers, CMU, and UMD onto my workstation at Sun. It was slow but it worked just fine.

I told my friend Ron Natalie at Rutgers, who was Associate Director of CCIS at the time, that I was able to access his private file systems over the internet from Sun, and he rightfully freaked out, because as a huge Sun customer in charge of security, nobody at Sun had ever told him about how incredibly insecure NFS actually was before, despite all Sun's promises. (Technically I was probably violating the terms of my NDA with Sun by telling him that, but tough cookies.)

For all Sun's lip service about NFS and networks and computers and security, it was widely know internally at Sun that NFS had No File Security, which was why it was such a running inside joke that Sun knowingly shipped it to their customers with such flagrantly terrible defaults, but didn't care to tell anyone who followed their advice and used their software that they were leaving their file systems wide open.

Here is an old news-makers email from Ron from Interop88 that mentions mounting NFS directories over the internet -- by then after I'd told him about NFS's complete lack of security, so he'd probably slightly secured his own servers by overriding the tftp defaults by then, and was able to mount it because he remembered one of the host names in /etc/exports and didn't need to fetch it with tftp to discover it:

>From: Ron Natalie <elbereth.rutgers.edu!ron.rutgers.edu!ron@rutgers.edu> Date: Wed, Oct 5, 1988, 4:09 AM To: NeWS-makers@brillig.umd.edu

>I love a trade show that I can walk into almost any booth and get logged in at reasonable speed to my home machine. One neat experiment was that The Wollongong Group provided a Sun 3/60C for a public mail reading terminal. It was lacking a windowing system, so I decided to see if I could start up NeWS on it. In order to do that, I NFS mounted the /usr partition from a Rutgers machine and Symlinked /usr/NeWS to the appropriate directory. This worked amazingly well.

>(The guys from the Apple booth thought that NeWS was pretty neat, I showed them how to change the menus by just editing the user.ps file.)

>-Ron

I posted about this fact earlier:

https://news.ycombinator.com/item?id=21102724

>DonHopkins on Sept 28, 2019 | parent | context | favorite | on: A developer goes to a DevOps conference

>I love the incredibly vague job title "Member, Technical Staff" I had at Sun. It could cover anything from kernel hacking to HVAC repair!

>At least I had root access to my own workstation (and everybody else's in the company, thanks to the fact that NFS actually stood for No File Security).

>[In the late 80's and early 90's, NFSv2 clients could change their hostname to anything they wanted before doing a mount ("hostname foobar; mount server:/foobar /mnt ; hostname original"), and that name would be sent in the mount request, and the server trusted the name the client claimed to be without checking it against the ip address, then looked it up in /etc/exports, and happily returned a file handle.

>If the NFS server or any of its clients were on your local network, you could snoop file handles by putting your ethernet card into promiscuous mode.

>And of course NFS servers often ran TFTP servers by default (for booting diskless clients), so you could usually read an NFS server's /etc/exports file to find out what client hostnames it allowed, then change your hostname to one of those before mounting any remote file system you wanted from the NFS server.

>And yes, TFTP and NFS and this security hole you could drive the space shuttle through worked just fine over the internet, not just the local area network.]

Sun's track record on network security isn't exactly "stellar" and has "burned" a lot of people (pardon the terrible puns, which can't hold a candle to IBM's "Eclipse" pun). The other gaping security hole at Sun I reported was just after the Robert T Morris Worm incident, as I explained to Martha Zimet:

>Oh yeah, there was that one time I accidentally hacked sun.com’s sendmail server, the day after the Morris worm.

>The worm was getting in via sendmail’s DEBUG command, which was usually enabled by default.

>One of the first helpful responses that somebody emailed around was a suggestion for blocking the worm by editing your sendmail binary, searching for DEBUG, and replacing the D with a NULL character.

>Which the genius running sun.com apparently did.

>That had the effect of disabling the DEBUG command, but enabling the zero-length string command!

>So as I often did, I went “telnet sun.com 25” to EXPN some news-makers email addresses that had been bouncing, and first hit return a couple of times to flush the telnet negotiation characters it sends, so the second return put it in debug mode, and the EXPN returned a whole page full of diagnostic information I wasn’t expecting!

>I reported the problem to postmaster@sun.com and they were like “sorry oops”.

I've mention that one a couple times before:

https://news.ycombinator.com/item?id=29250313

https://news.ycombinator.com/item?id=21101321

>>smarks+(OP)
Yes, Legato's first product was the Prestoserve NFS accelerator card (in 1989!).[0] NetApp's implementation mirrored the cache across two servers in a cluster with an interconnect.

NFSv3 "fixed" the write issue by adding a separate COMMIT RPC:

"The write throughput bottleneck caused by the synchronous definition of write in the NFS version 2 protocol has been addressed by adding support so that the NFS server can do unsafe writes. Unsafe writes are writes which have not been committed to stable storage before the operation returns. This specification defines a method for committing these unsafe writes to stable storage in a reliable way."[1]

[0] https://twitter.com/aka_pugs/status/1225665691472166912 [1] https://datatracker.ietf.org/doc/html/rfc1813#section-1.1

>>smarks+Xt
The funny thing is that we're running our large NFS servers on OmniOS (with Linux NFS clients), as plugins for a certain large PHP-based blogging platform loves to sprinkle LOCK_EX flocks all over the place.

Sadly with a Linux NFS server, lock state eventually corrupts itself to extinction but OmniOS can tick along past 300 days uptime without a problem.

Of course.... these issues only show up under production levels of load, and have never been able to be distilled into a reproducable test case.

FML, and FNL ( = fricking NFS locking :P)

>>smarks+2m
Stewart, I think "sucks" is a pretty fair description of a protocol that actually trusted the client to tell the server what its host name is, before the server checked that the host name appears in /etc/exports, without verifying the client's ip address. On a system that makes /etc/exports easily publicly readable via tftp by default.

Did you find my criticism of how X-Windows sucks in the Unix Haters Handbook as unfair and un-credible and lazy as you found the book's criticism of NFS? Or my criticism of OLWM and XBugTool, which also both sucked?

https://web.archive.org/web/20000423081727/http://www.art.ne...

Did you ever fix those high priority OLWM bugs I reported with XBugTool that OLWM unnecessarily grabbed the X11/NeWS server all the time and caused the input queue to lock up so you couldn't do anything for minutes at a time? And that related bug caused by the same grab problem that the window system would freeze if you pressed the Help key while resizing a window? Or manage to get OLWM's showcase Open Look menus to pin up without disappearing for an instant then reappearing in a different place, with a totally different looking frame around it, and completely different mouse tracking behavior? That unnecessary song and dance completely ruined the "pinned menu" user experience and pinned menu metaphor's illusion that it was the same menu before and after pinning. While TNT menus simply worked and looked perfectly and instantly when you pinned them, because it WAS the same menu after you pinned it, so it didn't have to flicker and change size and location and how it looked and behaved. Ironically, the NeWS Toolkit was MUCH better at managing X11 windows than the OLWM X11 window manager ever was, because our NeWS based X11 window manager "OWM" was deeply customizable and had a lot more advanced features like multiple rooms, scrolling virtual desktops, tabbed windows supporting draggable tabs on all four edges, resize edges, custom resize rubber banding animation, and pie menus, as well. It also never grabbed and froze the window server, and it took a hell of a lot less time and resources to develop than OLWM, which never lifted a finger to support TNT the way TNT bent over backwards to support X11.

NeWS Tab Window Demo -- Demo of the Pie Menu Tab Window Manager for The NeWS Toolkit 2.0. Developed and demonstrated by Don Hopkins:

https://www.youtube.com/watch?v=tMcmQk-q0k4

https://web.archive.org/web/19981203002306/http://www.art.ne...

>I39L window management complicates pinned menus enormously. TNT menus pin correctly, so that when you push the pin in, the menu window simply stays up on the screen, just like you'd expect. This is not the case with XView or even OLWM. Under an I39L window manager, the Open Look pinned menu metaphor completely breaks down. When you pin an X menu, it dissappears from the screen for an instant, then comes back at a different place, at a different size, with a different look and feel. If you're not running just the right window manager, pinned menus don't even have pins! There is no need for such "ICCCM compliant" behavior with TNT menus. When they're pinned, they can just stay there and manage themselves. But were TNT windows managed by an external I39L window manager, they would have to degenerate to the level of X menus.

https://web.archive.org/web/20000602215640/http://www.art.ne...

>I could go on and on, but I just lost my wonderful xbugtool, because I was having too much fun way too fast with those zany scrolling lists, so elmer the bug server freaked out and went off to la-la land, causing xbugtool to lock the windows and start "channeling", at the same time not responding to any events, so when I clicked on the scrolling list, the entire window system froze up and I had to wait for the input queue lock to break, but by the time the lock finally broke (it must have been a kryptonite), xbugtool had made up its mind, decided to meet its maker, finished core dumping, and exited with an astoundingly graceful thrash, which was a totally "dwim" thing for it to do at the time, if you think about it with the right attitude, since I had forgotten what I wanted to file a bug against in the first place anyway, and totally given up the idea of ever using bugtool to file a bug against itself, because bugtool must be perfect to have performed so splendidly!

From the news-makers archive:

>From: Skip Montanaro <crdgw1!montnaro@uunet.uu.net> Date: Feb 16, 1990

>Charles Hedrick writes concerning XNeWS problems. I have a couple of comments on the XNeWS situation.

>The olwm/pswm interface appears (unfortunately) to be stable as far as Sun is concerned. During XNeWS beta testing I complained about the lack of function key support, but was told it was an OpenLook design issue. (NeWS1.1 supported function keys, and you could do it in PostScript if you like.) Sun likes to tout how OpenLook is standard, and was designed by human factors types. As far as I'm concerned, nobody has had enough experience with good user interfaces to sit down and write a (horribly large, hard-to-read) spec from which a window manager with a "good" look-and-feel will be created. I'm convinced you still have to experiment with most user interfaces to get them right.

>As a simple example, consider Don Hopkins' recent tabframes posting. An extra goody added in tabframes is the edge-stretch thingies in the window borders. You can now stretch one edge easily, without inadvertently stretching the other edge connected to your corner-stretch thingie. Why did the OpenLook designers never think of this? SunView had that basic capability, albeit without visible window gadgetry. It wasn't like the idea was completely unheard of.

>I agree that running the XNeWS server with an alternate window manager is a viable option. Before I got my SPARCStation I used XNeWS in X11ONLY mode with gwm, which was the only ICCCM-compliant window manager I had available to me at the time. If you choose to use twm with XNeWS, I recommend you at least try the X11R4 version.

>From: William McSorland - Sun UK - Tech Support <will@willy.uk> Date: May 14, 1991 Subject: 1059370: Please evaluate

>Bug Id: 1059370 Category: x11news Subcategory: olwm Bug/Rfe: rfe Synopsis: OLWM does a Server Grab while the root menu is being displayed. Keywords: select, frame_busy, presses, left, mouse, server, grabbed Severity: 5 Priority: 5 Description:

>Customer inisisted on having this logged as a RFE and so:-

>When bringing up the root menu inside OW2.0 the window manager does a Server Grab hence forcing all its client applications output to be queued by the server, but not displayed.

>The customer recommends that this should be changed to make olwm more friendly.

>Apparently a number of other window managers don't do a server grab while the root menu is being displayed.

>From: Don Hopkins <hopkins@sun.com> Subject: 1059974: Bug report created

>Bug Id: 1059974 Category: x11news Subcategory: server Bug/Rfe: bug Synopsis: I have no mouse motion and my input focus is stuck in xbugtool!!! Keywords: I have no mouth and I must scream [Harlan Ellison] Severity: 1 Priority: 1 Description:

>This is my worst nightmare! None of my TNT or XView applications are getting any mouse motion events, just clicks. And my input focus is stuck in xbugtool, of all places!!! When I click in cmdtool, it gets sucked back into xbugtool when I release the button! And I'm not using click-to-type! I can make selections from menus (tnt, olwm, and xview) if I click them up instead of dragging, but nobody's receiving any mouse motion!

>I just started up a fresh server, ran two jets and a cmdtool, fired up a bugtool from one of the jets (so input focus must have been working then), and after xbugtool had throbbed and grunted around for a while and finally put up its big dumb busy window, I first noticed something was wrong when I could not drag windows around!

>Lucky thing my input focus ended up stuck in xbugtool!

>The scrollbar does not warp my cursor either... I can switch the input focus to any of xbugtool's windows, but I can't ... -- oomph errrgh aaaaahhh! There, yes!

>Aaaaah! What a relief! It stopped! I can move my mouse again!! Hurray!!! It started working when I opened a "jet" window, found I could type into it, so I moved the mouse around, the cursor disappeared, I typed, there were a couple of beeps, I still couldn't find the cursor, so I hit the "Open" key, the jet closed to an icon, and I could type to xbugtool again! And lo and behold now I can type into the cmdtool, too! Just by moving my cursor into it! What a technological wonder! Now I can start filing bug reports against cmdtool, which was the only reason I had the damn thing on my screen in the first place!!! I am amazed at the way the window system seems to read my mind and predict my every move, seeming to carry out elaborate practical jokes to prevent me from filing bugs against it. I had no idea the Open Windows desktop had such sophisticated and well integrated interclient communication!

>From: Don Hopkins <hopkins@sun.com> Subject: 1059976: Bug report created Date: May 21, 1991

>Bug Id: 1059976 Category: x11news Subcategory: olwm Bug/Rfe: bug Synopsis: OLWM menus are inconsistant with the rest of the desktop, Keywords: pinned menus, defaults, tracking, inconsistant look and feel, yet another open look toolkit Severity: 2 Priority: 2 Description:

>You can't set the default of a pinned menu by holding down the control key and clicking over an item.

>Pressing the middle button over the default of a pinned menu erases the default ring.

>You can't set the default of a unpinned menu by pressing the control key then the popping it up by pressing the MENU button on the mouse.

>When you're tracking a menu, and press the control key, the highlighting changes properly, from depressed to undepressed with a default ring, but when you release the control key before releasing the MENU button on the mouse, the highlighting does not change back to depressed without a default ring. Instead it stays the same, then changes to un-depressed without a default ring at the next mouse movement, and you have to move out and back into the menu item to see it depressed again.

>When you're dragging over a menu, then press the control key to set the default, then release the mouse button to make a selection, without releasing the control key, OLWM menus are stuck in default-setting mode, until the next time it sees a control key up transition while it is tracking a menu.

>Clicking the SELECT button on the abbreviated menu button on the upper left corner of the window frame (aka the close box or shine mark) should select the frame menu default, instead of always closing the window.

>The tracking when you press and release the control key over a menu pin is strange, as well. Push-pins as menu defaults are a dubious concept, and the HIT people should be consulted for an explaination or a correction.

>When you press the menu button in a submenu, it does not set the default of the menu that popped up the submenu, the way XView and TNT menus do. This behaviour also needs clarification from the HIT team.

>Pinned OLWM menus do not track the same way as unpinned menus. When you press down over an item, and drag to another item, the highlighting does not follow the mouse, instead the original item stays highlighted. The item un-highlights when the mouse exits it, but the menu highlighting should track the item underneath the cursor when the user is holding the mouse button down, just like they do with a non-pinned menu. The current behavior misleads you that the item would be selected if the button were released even though the cursor is not over the menu item, and is very annoying when you press down over a pinned menu item and miss the one you want, and expect to be able to simply drag over to the one you meant to hit.

>If we are crippling our menus this way on purpose, because we are afraid Apple is going to sue us, then Apple Computer has already done damage to Sun Microsystems without even paying their lawyers to go to court. We licensed the technology directly from Xerox, and we should not make our users suffer with an inferior interface because we are afraid of boogey-men.

>In general, OLWM is yet another OpenLook toolkit, and its menus are unlike any other menus on the desktop. This is a pity because the user interacts so closely with OLWM, and the conspicuous inconsistancy between the window manager and the Open Look applications that it frames leads people to expect inconsistancy and gives the whole system a very unreliable, unpredictable feel.

Other email about OLWM server grabs:

>1056853 x11news/x11: >OW Exit notice hangs system causing input queue lock brokens

>Status: Desktop integration issue. Marked in bug traq as evaluated.

>This is an X-and-NeWS integration issue and is a terribly complicated problem. X11 does not expect that server-internal locking will ever time out. X11 has similar situations to the one mentioned in the bug report where a client grabs the server and then a passive grab triggers. And it works fine. The difference is that the effect of the passive grab doesn't time-out, thereby causing an inconsistent state.

>One possibility cited for a fix is to change the server to stop distributing events to synchronous NeWS interests while an X client has the server grabbed. But this might only result in moving the problem around and not in solving the real problem.

>According to Stuart Marks, olwm could grab the keyboard and mouse before grabbing the server and that might get around this particular problem.

>ACTION: Stuart Marks should be supported in making this change.

Stewart, is any of that lazy unfair criticism?

replies(1): >>shadow+tP

>>bgm197+Mx
Ah, yes, NIS+. Thank you for the correction.

I also had in mind that OpenBSD deliberately and rigorously only refers to "YP" ("Yellow Pee"). Google "OpenBSD" and "NIS" and most of the hits you'll see directly from the OpenBSD project are from commit logs for patches removing accidental usages of "NIS" in initial YP-related feature commits. I'm not quite sure why they do that. I've kind of assumed it's to make clear that they have little interest in addressing vendor compatibility issues, and to emphasize that YP support, such as it is, is narrowly tailored to supporting the needs of the OpenBSD project itself. That's quite different from IPv6, IPSec/IKE, and even NFSv3, where cross-vendor interoperability is a concern (within reason).

replies(1): >>DonHop+m11

>>DonHop+fH
You've posted two extremely long posts about this here. If you helped write the Unix Hater's Handbook that makes this argument older than decent chunk of the people here, myself included. That's an impressively long time to hold a grudge.

replies(1): >>DonHop+DQ

>>shadow+tP
The topic of this discussion is "NFS: The Early Years", so if the Unix Haters Handbook is older than you are, then the topic of this discussion, The Early Years of NFS, is even older still.

That's an impressively long time for anyone born and working professionally for Sun Microsystems before The Early Years of NFS to hold the incorrect opinion that NFS doesn't suck. ;) So when smarks makes the provably false claim that NFS doesn't suck, and accuses me of being "lazy" for disagreeing with that, I'm glad I was diligent enough to keep the receipts, and generous enough to share them.

replies(1): >>shadow+yR

>>DonHop+DQ
Point taken. The security situation in particular is unbelievable in hindsight.

replies(1): >>DonHop+eS

>>shadow+yR
Believe it.

I just don't like being called "lazy" for saying "NFS Sucks" by the same guy whose window manager was so lazy it unnecessarily grabbed the X11 server all the time and locked up the window system for minutes at a time, and whose menus flickered and moved and resized and drew and tracked differently when you pinned them, since I've fairly and un-lazily written in great detail about NFS and other Sun security issues numerous times, and un-lazily implemented OPEN LOOK menus and a TNT X11/NeWS window manager that didn't suffer from all those usability problems.

Speaking of lazy menus: Correctly implemented Open Look pinned menus actually had to support two identical hot-synced menus existing and updating at the same time, in case you pinned a menu, then popped the same menu up again from the original location. The TNT menus would lazily create a second popup menu clone only when necessary (when it was already pinned and you popped it up again), and correctly supported tracking and redrawing either menu, setting the menu default item with the control key and programmatically changing other properties by delegating messages to both menus, so it would redraw the default item ring highlighting on both menus when you changed the default, or any other property.

Object oriented programming in The NeWS Toolkit was a lot more like playing with a dynamic Smalltalk interpreter, than pulling teeth with low level X11 programming in C with a compiler and linker plowing through mountains of include files and frameworks, so it was actually interactively FUN, instead of excruciatingly painful, and we could get a lot more work done in the same amount of time than X11 programmers.

Consequently, TNT had a much more thorough and spec-consistent implementation of pinned menus than OLWM, XView, OLIT, or MOOLIT, because NeWS was simply a much better window system that X11, and we were not lazy and didn't choose to selectively ignore or reinterpret the more challenging parts of the Open Look spec, like the other toolkits did because X-Windows and C made life so difficult.

See the comments in the "Clone" method and refs to the "PinnedCopy" instance variable in the PostScript TNT menu source code:

https://donhopkins.com/home/code/menu.ps

    % Copy this menu for pinning.  Factored out to keep the pinning code                                                                                                      
    % easier to read.  The clone has a few important differences, such as                                                                                                     
    % no pin or label regardless of the pin/label of the original, but is                                                                                                     
    % otherwise as close a copy as we can manage.

TNT Open Look Menu design doc:

https://donhopkins.com/home/archive/HyperLook/tnt-menu-desig...

replies(1): >>tonyg+G44

>>wahern+0K
Speaking of YP (which I always thought sounded like a brand of moist baby poop towelettes), BSD, wildcard groups, SunRPC, and Sun's ingenuous networking and security and remote procedure call infrastructure, who remembers Jordan Hubbard's infamous rwall incident on March 31, 1987?

https://news.ycombinator.com/item?id=25156006

https://en.wikipedia.org/wiki/Jordan_Hubbard#rwall_incident

>rwall incident

>On March 31, 1987 Hubbard executed an rwall command expecting it to send a message to every machine on the network at University of California, Berkeley, where he headed the Distributed Unix Group. The command instead began broadcasting Hubbard's message to every machine on the internet and was stopped after Hubbard realised the message was being broadcast remotely after he received complaints from people at Purdue University and University of Texas. Even though the command was terminated, it resulted in Hubbard receiving 743 messages and complaints, including one from the Inspector General of ARPAnet.

I was logged in on my Sun workstation "tumtum" when it happened, so I received his rwall too, and immediately sent him a humorous email with the subject of "flame flame flame" which I've lost in the intervening 35 years, but I still have a copy of his quick reply:

    From: Jordan K. Hubbard <jkh%violet.Berkeley.EDU@berkeley.edu>
    Date: Tue, Mar 31, 1987, 11:02 PM
    To: Don Hopkins <don@tumtum.cs.umd.edu>
    Subject: re: flame flame flame

    Thanks, you were nicer than most.. Here's the stock letter I've been
    sending back to people:

    Thank you, thank you..

    Now if I can only figure out why a lowly machine in a basement somewhere
    can send broadcast messages to the entire world. Doesn't seem *right*
    somehow.

                                        Yours for an annoying network.

                                        Jordan

    P.S. I was actually experimenting to see exactly now bad a crock RPC was.
    I'm beginning to get an idea. I look forward to your flame.

                                                Jordan

Here's the explanation he sent to hackers_guild, and some replies from old net boys like Milo Medin (who said the program manager of the Arpanet in the Information Science and Technology Office of DARPA Dennis G. Perry said they would kick UCB off the Arpanet if it ever happened again), Mark Crispin (who presciently proposed cash rewards for discovering and disclosing security bugs), and Dennis G. Perry himself:

    From: Jordan K. Hubbard <jkh%violet.Berkeley.EDU@berkeley.edu>
    Date: April 2, 1987
    Subject: My Broadcast

    By now, many of you have heard of (or seen) the broadcast message I sent to
    the net two days ago. I have since received 743 messages and have
    replied to every one (either with a form letter, or more personally
    when questions were asked). The intention behind this effort was to
    show that I wasn't interested in doing what I did maliciously or in
    hiding out afterwards and avoiding the repercussions. One of the
    people who received my message was Dennis Perry, the Inspector General
    of the ARPAnet (in the Pentagon), and he wasn't exactly pleased.
    (I hear his Interleaf windows got scribbled on)

    So now everyone is asking: "Who is this Jordan Hubbard, and why is he on my
    screen??"

    I will attempt to explain.

    I head a small group here at Berkeley called the "Distributed Unix Group".
    What that essentially means is that I come up with Unix distribution software
    for workstations on campus. Part of this job entails seeing where some of
    the novice administrators we're creating will hang themselves, and hopefully
    prevent them from doing so. Yesterday, I finally got around to looking
    at the "broadcast" group in /etc/netgroup which was set to "(,,)". It
    was obvious that this was set up for rwall to use, so I read the documentation
    on "netgroup" and "rwall". A section of the netgroup man page said:

      ...
         Any of three fields can be empty, in which case it signifies
         a wild card.  Thus
                    universal (,,)
         defines a group to which everyone belongs.  Field names that ...
      ...

    Now "everyone" here is pretty ambiguous. Reading a bit further down, one
    sees discussion on yellow-pages domains and might be led to believe that
    "everyone" was everyone in your domain. I know that rwall uses point-to-point
    RPC connections, so I didn't feel that this was what they meant, just that
    it seemed to be the implication.

    Reading the rwall man page turned up nothing about "broadcasts". It doesn't
    even specify the communications method used. One might infer that rwall
    did indeed use actual broadcast packets.

    Failing to find anything that might suggest that rwall would do anything
    nasty beyond the bounds of the current domain (or at least up to the IMP),
    I tried it. I knew that rwall takes awhile to do its stuff, so I left
    it running and went back to my office. I assumed that anyone who got my
    message would let me know.. Boy, was I right about that!

    After the first few mail messages arrived from Purdue and Utexas, I begin
    to understand what was really going on and killed the rwall. I mean, how
    often do you expect to run something on your machine and have people
    from Wisconsin start getting the results of it on their screens?

    All of this has raised some interesting points and problems.

    1. Rwall will walk through your entire hosts file and blare at anyone
       and everyone if you use the (,,) wildcard group. Whether this is a bug
       or a feature, I don't know.

    2. Since rwall is an RPC service, and RPC doesn't seem to give a damn
       who you are as long as you're root (which is trivial to be, on a work-
       station), I have to wonder what other RPC services are open holes. We've
       managed to do some interesting, unauthorized, things with the YP service
       here at Berkeley, I wonder what the implications of this are.

    3. Having a group called "broadcast" in your netgroup file (which is how
       it comes from sun) is just begging for some novice admin (or operator
       with root) to use it in the mistaken belief that he/she is getting to
       all the users. I am really surprised (as are many others) that this has
       taken this long to happen.

    4. Killing rwall is not going to solve the problem. Any fool can write
       rwall, and just about any fool can get root priviledge on a Sun workstation.
       It seems that the place to fix the problem is on the receiving ends. The
       only other alternative would be to tighten up all the IMP gateways to
       forward packets only from "trusted" hosts. I don't like that at all,
       from a standpoint of reduced convenience and productivity. Also, since
       many places are adding hosts at a phenominal rate (ourselves especially),
       it would be hard to keep such a database up to date. Many perfectly well-
       behaved people would suffer for the potential sins of a few.

    I certainly don't intend to do this again, but I'm very curious as to
    what will happen as a result. A lot of people got wall'd, and I would think
    that they would be annoyed that their machine would let someone from the
    opposite side of the continent do such a thing!

                             Jordan Hubbard
                             jkh@violet.berkeley.edu (ucbvax!jkh)
                             Computer Facilities & Communications.
                             U.C. Berkeley

    From: Milo S. Medin <medin@orion.arpa>
    Date: Apr 6, 1987, 5:06 AM

    Actually, Dennis Perry is the head of DARPA/IPTO, not a pencil pusher
    in the IG's office.  IPTO is the part of DARPA that deals with all
    CS issues (including funding for ARPANET, BSD, MACH, SDINET, etc...).
    Calling him part of the IG's office on the TCP/IP list probably didn't
    win you any favors.  Coincidentally I was at a meeting at the Pentagon
    last Thursday that Dennis was at, along with Mike Corrigan (the man
    at DoD/OSD responsible for all of DDN), and a couple other such types
    discussing Internet management issues, when your little incident
    came up.  Dennis was absolutely livid, and I recall him saying something
    about shutting off UCB's PSN ports if this happened again.  There were
    also reports about the DCA management types really putting on the heat
    about turning on Mailbridge filtering now and not after the buttergates
    are deployed.  I don't know if Mike St. Johns and company can hold them
    off much longer.  Sigh...  Mike Corrigan mentioned that this was the sort
    of thing that gets networks shut off.  You really pissed off the wrong
    people with this move! 

    Dennis also called up some VP at SUN and demanded this hole
    be patched in the next release.  People generally pay attention
    to such people.

                                            Milo

    From: Mark Crispin <MRC%PANDA@sumex-aim.stanford.edu>
    Date: Mon, Apr 6, 1987, 10:15 AM

    Dan -

         I'm afraid you (and I, and any of the other old-timers who
    care about security) are banging your head against a brick wall.
    The philsophy behind Unix largely seems quite reminiscent of the
    old ITS philsophy of "security through obscurity;" we must
    entrust our systems and data to a open-ended set of youthful
    hackers (the current term is "gurus") who have mastered the
    arcane knowledge.

         The problem is further exacerbated by the multitude of slimy
    vendors who sell Unix boxes without sources and without an
    efficient means of dealing with security problems as they
    develop.

         I don't see any relief, however.  There are a lot of
    politics involved here.  Some individuals would rather muzzle
    knowledge of Unix security problems and their fixes than see them
    fixed.  I feel it is *criminal* to have this attitude on the DDN,
    since our national security in wartime might ultimately depend
    upon it.  If there is such a breach, those individuals will be
    better off if the Russians win the war, because if not there will
    be a Court of Inquiry to answer...

         It may be necessary to take matters into our own hands, as
    you did once before.  I am seriously considering offering a cash
    reward for the first discoverer of a Unix security bug, provided
    that the bug is thoroughly documented (with both cause and fix).
    There would be a sliding cash scale based on how devastating the
    bug is and how many vendors' systems it affects.  My intention
    would be to propagate the knowledge as widely as possible with
    the express intension of getting these bugs FIXED everywhere.

         Knowledge is power, and it properly belongs in the hands of
    system administrators and system programmers.  It should NOT be
    the exclusive province of "gurus" who have a vested interest in
    keeping such details secret.

    -- Mark --

    PS: Crispin's definition of a "somewhat secure operating system":
    A "somewhat secure operating system" is one that, given an
    intelligent system management that does not commit a blunder that
    compromises security, would withstand an attack by one of its
    architects for at least an hour.

    Crispin's definition of a "moderately secure operating system": a
    "moderately secure operating system" is one that would withstand
    an attack by one of its architects for at least an hour even if
    the management of the system are total idiots who make every
    mistake in the book.
    -------

    From: Dennis G. Perry <PERRY@vax.darpa.mil>
    Date: Apr 6, 1987, 3:19 PM

    Jordan, you are right in your assumptions that people will get annoyed
    that what happened was allowed to happen.

    By the way, I am the program manager of the Arpanet in the Information
    Science and Technology Office of DARPA, located in Roslin (Arlington), not
    the Pentagon.

    I would like suggestions as to what you, or anyone else, think should be
    done to prevent such occurances in the furture.  There are many drastic
    choices one could make.  Is there a reasonable one?  Perhaps some one
    from Sun could volunteer what there action will be in light of this
    revelation.  I certainly hope that the community can come up with a good
    solution, because I know that when the problem gets solved from the top
    the solutions will reflect their concerns.

    Think about this situation and I think you will all agree that this is
    a serious problem that could cripple the Arpanet and anyother net that
    lets things like this happen without control.

    dennis
    -------

Also:

http://catless.ncl.ac.uk/Risks/4.73.html#subj10.1

https://everything2.com/title/Jordan+K.+Hubbard

replies(2): >>dekhn+gu1 >>tptace+Xn2

>>wahern+Qf
(PM-T on the EFS team)

S3 and EFS actually are quite different. Files on EFS are update-able, rename-able and link-able (I.e what’s expected from a file system), while S3 objects are immutable once they are created. This comes from the underlying data structures. EFS uses inodes and directories while S3 is more of a flat map.

Protocol-wise EFS uses standard NFS 4.1. We added some optional innovations outside the protocol that you can use through our mount helper (mount.efs). This includes in-transit encryption with TLS (you can basically talk TLS to our endpoint and we will detect that automatically), and we support strong client auth using SigV4 over x509 client certificate.

replies(2): >>throw0+Uc1 >>rfoo+cN2

>>smarks+2m
As others have said I think the title was more clickbait-y. Olaf was the person who wrote the in-kernel NFS server so I think he at least appreciated NFS somewhat. The paper makes a few reasonable criticisms many of which are addressed now.

My favorite criticism from that paper is that NFS clients reused the source port so that the server can detect whether a new connection is the same client or not. This confuses stateful packet filtering on the network because both connections now have the same 5-tuple and packets on the new connection can look like out of window packets on the old connection. This can get connections blackholed depending on the network. This was fixed a few years ago in the Linux client for NFS v4.1, since that version of the protocol already has a different way to identify clients. Before this was fixed, EFS had to document a workaround.

>>geertj+t31
> This includes in-transit encryption with TLS […]

Will EFS be updated to use the NFS-TLS RFC once it settles down some?

* https://datatracker.ietf.org/doc/html/draft-ietf-nfsv4-rpc-t...

replies(2): >>chasil+jo1 >>geertj+Fo1

>>throw0+Uc1
It looked like work on this had stopped. Is there still hope that it might become a published RFC?

replies(2): >>throw0+pp1 >>geertj+Cq1

>>throw0+Uc1
> Will EFS be updated to use the NFS-TLS RFC once it settles down some?

I can't commit on a public forum for obvious reasons but we'll definitely take a serious look at this, especially when the Linux client starts supporting this. We did consult with the authors of that draft RFC earlier and it should be relatively easy for us to adopt this.

replies(1): >>throw0+tp1

>>chasil+jo1
Activity on the NFSv4 mailing list:

* https://mailarchive.ietf.org/arch/browse/nfsv4/

But no recent commits to the draft:

* https://github.com/chucklever/i-d-rpc-tls

replies(1): >>chasil+Kw1

>>geertj+Fo1
Cool.

>>chasil+jo1
> It looked like work on this had stopped. Is there still hope that it might become a published RFC?

I don't know, I hope it will.

Not to go on too much of a tangent, and at the risk of sounding like my employer's fanboy, but one of the great things about working at AWS (I'm being honest, and yes we are hiring SDEs and PMs) is that we 100% focus on the customer. When our customers told us they needed encryption in transit, we figured out we could simply offer them transport-level TLS independent from the application-level RPC protocol. It may not have been the standards-compliant approach, but our customers have been enjoying fast reliable encryption for over 4 years now [1]. It solves a real problem because customers have compliance requirements.

[1] https://aws.amazon.com/about-aws/whats-new/2018/04/amazon-ef...

>>DonHop+m11
thanks, always fun to read this history.

>>throw0+pp1
Thanks for the mailing list link.

Here is the status:

"This one had to be paused for a bit to work out some issues around using a wider type to hold the epoch value, to accomodate some DTLS-SCTP use cases involving associations expected to remain up for years at a time. https://github.com/tlswg/dtls13-spec/issues/249 ends up covering most of the topics, though the discussion is a bit jumbled. We have a proposed solution with almost all the signoffs needed, and should be attempting to confirm this approach at the session at IETF 112 next week...

"I'm sorry that these have been taking so long; these delays were unexpected."

>>smarks+(OP)
That's Matthew Wilcox you're referring to, and he's right.

It's not just about reads and writes - doing a stateless protocol for a block device is fine. Think more about unlink...

>>DonHop+m11
Immediately one of the all-time great HN posts.

>>geertj+t31
> in-transit encryption with TLS

Last I checked it spawns a HAProxy on the client and points the in-kernel NFS client to this HAProxy on lo, is this still the case?

And, out of curiosity: now that EFS claims 600us average read latency, would the extra hop matter?

replies(2): >>acdha+IN2 >>geertj+Af3

>>rfoo+cN2
It currently uses stunnel to encrypt the connection — I only started it a couple of years ago but have never seen a reference to HAProxy.

>>rfoo+cN2
The sibling comment is correct. The EFS mount helper starts up and manages an stunnel process. We have not seen a significant impact on latency from the stunnel process.

>>DonHop+eS
> Object oriented programming in The NeWS Toolkit was a lot more like playing with a dynamic Smalltalk interpreter, than pulling teeth with low level X11 programming in C

Funnily enough I've been writing a pure-Smalltalk X11 protocol implementation recently, for Squeak, and it starts to have some of the feel you describe. It generates Smalltalk code from the XML xcbproto definitions. It's at the point now where you can send X11 requests interactively in a Workspace, etc., which is fun ("playing with a dynamic Smalltalk interpreter"), and I'm working on integrating it with Morphic. Anyway, thought you might enjoy the idea.