(The reason I did that is that the anti-crawler protections also unfortunately hit some legit users, and we don't want to block legit users. However, it seems that I turned the knobs down too far.)
In this case, though, we had a secondary failure: PagerDuty woke me up at 5:24am, I checked HN and it seemed fine, so I told PagerDuty the problem was resolved. But the problem wasn't resolved - at that point I was just sleeping through it.
I'll add more as we find out more, but it probably won't be till later this afternoon PST.
Edit: later than I expected, but for those still following, the main things I've learned are (1) pkill wasn't able to kill SBCL this time - we have a script that does that when HN stops responding, but it didn't work, so we'll revise the script; and (2) how to get PagerDuty not to let you go back to sleep if your site is actually still down.
We all knew that but I haven't seen any confirmation before this.
I think you're confusing popularity with criticality. I'm sure everyone in here can withstand a few hours without browsing the page.
It's dang's baby at this point, and this is a good thing, as long as HN doesn't affect his life in ways he doesn't want.
However, when something I care about crashes and burns once in a blue moon, I make sure to put the fire out, at least to make it survive till regular hours. Things I care about can be both business and personal, and nobody bugs me for them.
Maybe we shouldn't make any assumptions about people we don't personally know, while we are at it.