(The reason I did that is that the anti-crawler protections also unfortunately hit some legit users, and we don't want to block legit users. However, it seems that I turned the knobs down too far.)
In this case, though, we had a secondary failure: PagerDuty woke me up at 5:24am, I checked HN and it seemed fine, so I told PagerDuty the problem was resolved. But the problem wasn't resolved - at that point I was just sleeping through it.
I'll add more as we find out more, but it probably won't be till later this afternoon PST.
Edit: later than I expected, but for those still following, the main things I've learned are (1) pkill wasn't able to kill SBCL this time - we have a script that does that when HN stops responding, but it didn't work, so we'll revise the script; and (2) how to get PagerDuty not to let you go back to sleep if your site is actually still down.
Enjoy your deserved sleep and if for a couple of hours it's down, so be it.
Thanks for your continued service!
Though I will say, HN is a pretty great source of information about major outages like the recent AWS and Cloudflare issues. I had a moment this morning where I thought, oh, is there a larger issue and then, oh, HN is down, huh, the next option is so far down my list that it's going to take me a moment to think of it.
I hope that serves as a testament to how great this site and the community is. Thanks for all your hard work keeping it that way!
Option 4: take your grab bag with the tcp over IP shortwave radio, sextant and head for pre-cached month supply of food in the hills.