Tell HN: HN Moved from M5 to AWS

>>1vuio0+(OP)
I was surprised they only had one backup server, especially given the competitive price of rackmount hardware these days. More replicas needed.

Although this was a fun exercise to learn how lost I feel without HN. Damn.

>>metada+7e
Our thinking was: (1) keep a hot standby to fail over to when we need it—that keeps downtime to seconds in routine cases (like pre-planned maintenance) and minutes or an hour in most failure cases—for example, when our primary server died last night, HN was down for about an hour while we brought up the standby; and (2) In the unlikely event that both the primary and standby servers fail at the same time, be able to bring up a fresh server from backup within hours, not days. The latter case is what happened today, and in the end we were down for just under 8 hours. (Assuming we don't sink back into the pit of hell overnight.)

Assuming things don't fail again in the next day or two, since we still have a lot to take care of (fingers crossed—definitely not gloating), I feel like this was pretty reasonable. We don't have a lot of dev or ops resources—few people work on HN, and only me full-time these days. The more complex one's replica architecture, the higher the maintenance costs. The simplicity of our setup has served us well in the 9 years that we've been running it, and I feel like the tradeoff of "several hours downtime once a decade" is worth it if you draw one of those risk/cost managerial whiteboard things.

>>dang+8g
It might be worth considering a way to get a we're working on it notice up quickly. (HN status on twitter worked, but it's kind of nicer when something loads at the main address), but an 8 hour outage once a decade for something that's not really critical is pretty good; no need to increase complexity, although try to get some storage diversity for the future, now that you've learned about that.

zlacker