zlacker

[return to "Tell HN: HN was down"]
1. dang+zk[view] [source] 2025-12-17 18:09:25
>>uyzstv+(OP)
Yes, sorry! We're investigating, but my current theory is we got overloaded because I relaxed some of our anti-crawler protections a few days ago.

(The reason I did that is that the anti-crawler protections also unfortunately hit some legit users, and we don't want to block legit users. However, it seems that I turned the knobs down too far.)

In this case, though, we had a secondary failure: PagerDuty woke me up at 5:24am, I checked HN and it seemed fine, so I told PagerDuty the problem was resolved. But the problem wasn't resolved - at that point I was just sleeping through it.

I'll add more as we find out more, but it probably won't be till later this afternoon PST.

Edit: later than I expected, but for those still following, the main things I've learned are (1) pkill wasn't able to kill SBCL this time - we have a script that does that when HN stops responding, but it didn't work, so we'll revise the script; and (2) how to get PagerDuty not to let you go back to sleep if your site is actually still down.

◧◩
2. bicepj+Bw[view] [source] 2025-12-17 19:02:02
>>dang+zk
Even after providing firebase endpoint, crawlers come to the site ?
◧◩◪
3. dang+F71[view] [source] 2025-12-17 22:08:13
>>bicepj+Bw
Oh my god. It's the crawlopalypse.
◧◩◪◨
4. collin+eP3[view] [source] 2025-12-18 18:32:59
>>dang+F71
Yes. It's hard to explain the experience of hosting a website since 2023.

A crazy amount of really dumb bots loading every url on the website in a full headless browser, with default Chrome user-agent string. All different ip addresses, various countries and ASNs.

These crawlers are completely automated and simply crawl _everything_ and don't care at all if there's value in what they're crawling or if there's duplicate content, etc.

There's no attempt at efficiency, just blindly crawl the entire internet 24/7. Every page load (1 per second or more?) is from a different ip address.

[go to top]