After many years of remaining static, HN's IP address changed.^1
Old: 209.216.230.240
New: 50.112.136.166
Perhaps this is temporary.
Little known fact: HN is also available through Cloudflare. Unlike CF, AWS does not support TLS1.3.^2 This is not working while HN uses the AWS IP.
1. Years ago someone on HN tried to argue with me that IP addresses will never stay the same for very long. I used HN as an example of an address that does not change very often. I have been waiting for years. I collect historical DNS data. When I remind HN readers that most site addresses are more static than dynamic, I am basing that statement on evidence i have collected.
2. Across the board, so to speak. Every CF-hosted site I have encountered supports TLS1.3. Not true for AWS. Many (most?^3) only offer TLS1.2.
3. Perhaps a survey is in order.
It’s also a reminder that beefy virtual servers are pretty darn beefy nowadays. I wonder which tier they went with.
Not even a joke.
This seemed implausible so I looked into it, and it's wrong as stated (at best, it needs to be made more precise to capture what you intended). First, you've mentioned Cloudflare, but the equivalent AWS product (CloudFront) does support TLS 1.3 (https://aws.amazon.com/about-aws/whats-new/2020/09/cloudfron...).
HN isn't behind CloudFront, though, so you probably mean their HTTP(s) load balancers (ALB) don't support TLS 1.3. Even that's an incomplete view of the load balancing picture, since the network load balancers (NLB) do support TLS 1.3, https://aws.amazon.com/about-aws/whats-new/2021/10/aws-netwo....
Even during the most extreme AWS events, my EC2 instances running dedicated servers kept seeing Internet traffic.
Sure. But without seeing the other sides argument, I have to wonder if their point wasn't that they're not designed to be stable for the purpose of identifying a service/thing on the Internet; things can and do move and change. Hardware failure is a good example of that. Just like a house address, those too are normally stable but people can & do move. Just with software, it's like we look our friend up in the white pages¹ prior to every visit, which one might not do in real life.
¹oh God I'm dating myself here.
There's https://en.wikipedia.org/wiki/Apple_A5 and https://en.wikipedia.org/wiki/Apple_M1 https://en.wikipedia.org/wiki/Apple_M2
What does that mean? How do you access HN through CloudFlare and what do you mean by AWS not supporting TLS1.3? You can certainly run any https server on EC2, including one that supports TLS1.3.
It's sort of like Rule 34, but for HN. "There is data of it"
The only reason we are here doing this sort of thing is because we are “Justin Bieber” fans. We aren’t here because changing a tire is interesting, unique, nor will we learn anything from it – especially this particular tire (HN is like the Toyota Corolla of “vehicles” compared to the other complex mission-critical distributed systems that make up other popular web services).
Why do you collect this? And in what format?
Hopefully it’s simply M5 didn’t have a server ready and they’ll migrate back.
Vultr has a great assortment of bare metal servers.
$ dig +noall +answer A news.ycombinator.com
news.ycombinator.com. 0 IN A 50.112.136.166
$ nslookup 50.112.136.166
166.136.112.50.in-addr.arpa name = ec2-50-112-136-166.us-west-2.compute.amazonaws.com.I was on macOS when I typed it, there it's Control+Cmd+Space, and then search for "super" which gets close enough.
On my Linux machine, I can either do Compose, ^, 1, or Super+e and then search for it. (But both of these require configuration; either setting a key to be Compose (I sacrifice RAlt), or setting up whatever it is the IME I have is for Super+e.)
Our primary server died around 11pm last night (PST), so we switched to our secondary server, but then our secondary server died around 6am, and we didn't have a third.
The plan was always "in the unlikely event that both servers die at the same time, be able to spin HN up on AWS." We knew it would take us several hours to do that, but it seemed an ok tradeoff given how unlikely the both-servers-die-at-the-same-time scenario seemed at the time. (It doesn't seem so unlikely now. In fact it seems to have a probability of 1.)
Given what we knew when we made that plan, I'm pretty pleased with how things have turned out so far (fingers crossed—no jinx—definitely not gloating). We had done dry runs of this and made good-enough notes. It sucks to have been down for 8 hours, but it could have been worse, and without good backups (thank you sctb!) it would have been catastrophic.
Having someone as good as mthurman do most of the work is also a really good idea.
We wouldn't make HN decisions on that basis anyhow, though, I don't think. Maybe if all other things were literally equal.
Question: so will HN be migrating back to M5 (or another hosting provider).
The disks were in two physically separate servers that were not connected to each other. I believe, however, that they were of similar make and model. So the leading hypothesis seems to be that perhaps the SSDs were from the same manufacturing batch and shared some defect. In other words, our servers were inbred! Which makes me want to link to the song 'Second Cousin' by Flamin' Groovies.
The HN hindsight consensus, to judge by the replies to https://news.ycombinator.com/item?id=32026606, is that this happens all the time, is not surprising at all, and is actually quite to be expected. Live and learn!
Of course you want everything in DNS and if your IP is supposed to be dynamic you should have provisions for automatically updating it (I have some shell script somewhere that calls nsupdate over ssh although I looked for it the other day and couldn't find it which is a bit disturbing.)
Although this was a fun exercise to learn how lost I feel without HN. Damn.
1. Why do I state that. Because I kept reading about why DNS was created and always encountered the same parroted explanation, year after year. Something along the lines that IP addresses were constantly in flux. That may have been true when DNS was created and the www was young. But was it true today. I wanted to find out. I did experiments. I found I could use the same DNS data day after day, week after week, month after month, year after year.
Why would I care. Because by eliminating remote DNS lookups I was able to speed up the time it takes me to retrieve data from the www.^2 Instead of making the assumption that every site is going to switch IP addresses every second, minute, day or week, I assume that only a few will do that and most will not. I want to know about those sites that are changing their IP address. I want to know the reasons. When a site changes its IP address, I am alerted, as you see with today's change to HN's address. Whereas when people assume every site is frequently changing its IP address, they perform unnecesary DNS lookups for the majority of sites. That wastes time among other things. And, it seems, people are unaware when sites change addresses.
2. Another benefit for me is that when some remote DNS service does down (this has happened several times), I can still use the www without interruption. I already have the DNS data I need. Meanwhile the self-proclaimed "experts" go into panic mode.
>We had done dry runs of this in the past,
Incredible. Actual disaster recovery.
Assuming things don't fail again in the next day or two, since we still have a lot to take care of (fingers crossed—definitely not gloating), I feel like this was pretty reasonable. We don't have a lot of dev or ops resources—few people work on HN, and only me full-time these days. The more complex one's replica architecture, the higher the maintenance costs. The simplicity of our setup has served us well in the 9 years that we've been running it, and I feel like the tradeoff of "several hours downtime once a decade" is worth it if you draw one of those risk/cost managerial whiteboard things.
Deeply deeply agree with you. Not whoever was arguing. :) My "dialup" VM in the cloud I use for assorted things has had the same IP for at least 7 years, probably longer. (Thanks Linode!) After a few years, it's honestly not that hard to remember an arbitrary 32bit int. :)
$ w3m -dump http://846235814 # ;)kabdib> Let me narrow my guess: They hit 4 years, 206 days and 16 hours . . . or 40,000 hours. And that they were sold by HP or Dell, and manufactured by SanDisk.
mikiem> These were made by SanDisk (SanDisk Optimus Lightning II) and the number of hours is between 39,984 and 40,032...
Just run a DNS server locally configured to serve stale records if upstream is unavailable.
As for your first point, the same local DNS server would also provide you with lower/no latency.
This is a known issue in NAS systems, and Freenas always recommended running two raid arrays with 3 disks in each array for mission critical equipment. By doing so, you can lose a disk in each array and keep on trucking without any glitches. Then if you happen to kill another disk during restriping, it would failover to the second mirrored array.
You could hotswap any failed disks in this setup without any downtime. The likelihood of losing 3 drives together in a server would be highly unlikely.
Thank you for stating the truth.
> they perform unnecesary DNS lookups for the majority of sites
Is it actually unnecessary if the IPs can change? I'm fine with the extra 20ms on the access every once in a while in exchange for no mysterious failure every few years.
Really sorry that you had to learn the hard way, but this is unfortunately common knowledge :/ Way back (2004) when I was shadowing-eventually-replacing a mentor that handled infrastructure for a major institution, he gave me a rule I took to heart from then forward: Always diversify. Diversify across manufacturer, diversify across make/model, hell, if it's super important, diversify across _technology stacks_ if you can.
It was policy within our (infrastructure) group that /any/ new server or service must be build-able from at least 2 different sources of components before going live, and for mission critical things, 3 is better. Anything "production" had to be multihomed if it connects to the internet.
Need to build a new storage server service? Get a Supermicro board _and_ a Tyan (or buy an assortment of Dell & IBM), then populate both with an assortment of drives picked randomly across 3 manufacturers, with purchases spread out across time (we used 3months) as well as resellers. Any RAID array with more than 4 drives had to include a hot spare. For even more peace of mind, add a crappy desktop PC with a ton of huge external drives and periodically sync to that.
He also taught me that it's not done until you do a few live "disaster tests" (yanking drives out of fully powered up servers, during heavy IO. Brutally ripping power cables out, quickly plugging it back in, then yanking it out again once you hear the machine doing something, then plug back in...), without giving anyone advance notice. Then, and only then, is a service "done".
I thought "Wow, $MENTOR is really into overkill!!" at the time, but he was right.
I credit his "rules for building infrastructure" for having a zero loss track record when it comes to infra I maintain, my whole life.
Speak for yourself, some of us (at least me) find "postmortum writeups" FASCINATING!!
I read them every chance I get. Most of the time the root cause wouldn't have affected me, but, I still occasionally will read one, think "oh crap! that could have bit me too!", then add the fix to my "Standard Operating Procedures" mental model, or whatnot. Some of us are still trying to "finish the game" with zero losses. :)
I used to serve DNS data over a localhost authoritative server. Now I store most DNS data in a localhost forward proxy.
If "upstream" means third party DNS service to resolve names piecemeal while accessing the www, I do not do that.^1
1. I do utilise third party DoH providers for bulk DNS data retrieval. Might as well, because DoH allows for HTTP/1.1 pipelining. I get DNS data from a variety of sources, rather than only one.
2. If it were "BS" then that would imply I am trying to mislead or deceive. The reverse is true. I kept reading sources of information about the internet that were meant to have me believe that most DNS RRs are constantly changing. I gathered DNS data. The data suggested those sources, whether intentionally or not, could be misleading and deceptive. Most DNS RRs did not change. BS could even mean that I am lying. But if I were lying and the DNS RRs for the sites I access were constantly changing, then the system I devised for using stored DNS data would not work. That is false. It works. I have been using it for years.
FWIW, I still memorize phone numbers too! I also avoid phone calls like the plague, so in reality it's just a handful of numbers per year.
> I have a few memorised, mainly some TLDs and Internic. With those I can "bootstrap" using DNS/FTP to get any other address.
Pretty sure that's true of 90% of everyone here, since all you really need to memorize is a public dns resolver, and 8.8.8.8 or 1.1.1 .1 is a particularly easy address to remember.
Nobody claimed it didn't work. The claim that is disputed is it is meaningfuly faster.
Interesting! That runs in direct conflict to what I learned eons ago (pre-web) for "why DNS?". (Or maybe, it conflicts with what my faulty meat brain remembers.)
The gist was "we have DNS because without it, people would have use numbers. people don't like numbers." DNS is primarily there to provide semantic meaning". The fact that it allows the numbers to change is.. a secondary bonus.
DNS exists for the same reason as variable names instead "variable numbers" (like a, b, c, d, &c) For us humans to provide semantic labels to things.
(an aside, "variable number" is exactly how things are still done in math and physics. This amuses me greatly.)
I am not really I fan because I like to choose the IP address, instead of letting someone else decide. I believe in user choice.
In some cases I have found the "most optimal" IP address for me is not always the one advertised based on the location of the computer sending the query.
It is like choosing a mirror when downloading open source software. I know which mirrors I prefer. The best ones for me are not necessarily always the ones closest geographically.
As for the question, the answer is yes. Because if it did not change then the query was not needed. If it does change then I will know and I will get the new address. The small amount of time it takes to get the new address and update a textfile is acceptable to me. I may also investigate why the address changed. Why did this HN submission go to the front page, why does it have so many points and comments. Some people are interested when stuff happens. I actually like "mysterious failures" because I want to know more about the sites I visit. Whereas an extra delay every time a TTL expires, for every name, again and again, over and over, every day, that is a lot of time cumulatively. Not to mention then I have to contend with issues of DNS privacy and security. When I started weaning myself off DNS lookups, there was no zone signing and encrypted queries.
The approach I take is not for everybody. I make HTTP requests outside the browser and I read HTML with a text-only browser. I do what works best for me.
Who are the people that use HN and would notice it moved hosting, but haven’t heard of Digital Ocean?
This reminds me of Voltaire: "Common sense is not so common."
Thanks for the great comment—everything you say makes perfect sense and is even obvious in hindsight, but it's the kind of thing that tends to be known by grizzled infrastructure veterans who had good mentors in their chequered past—and not so much by the rest of us.
I fear getting karmically smacked for repeating this too often, but the more I think about it, the more I feel like 8 hours of downtime is not an unreasonable price to pay for this lesson. The opportunity cost of learning it beforehand would have been high as well.
Cheers, you are the true and literal soul of the machine embodying the best spirit of the oftentimes beautiful thing that is Post-Paul-Graham HackerNews.
Please just promise to never die.
I run my own authoritative DNS on my router (tho not localhost. interesting), and have for a long time (since I started traffic shaping to push the ACKs to the front). Like you, I've also enjoyed having superior performance over those using public servers. Everyone says "but you can use 8.8.8.8 or 1.1.1.1! they're fast!." and I (we?) smile and nod.
Just did a quick little test for this comment. Resolving with 8.8.8.8 is fast! And... also between 800% and 2500% slower than using my (and your) setup. high five
Also, the haters don't know something that we do, which is that... sometimes 8.8.8.8 doesn't work!!!
A few weeks ago there was a website I couldn't access from a computer using 8.8.8.8. I thought, "that's odd", used dig, and it didn't resolve. From the same network I tried a different resolver -- worked. Tried 8.8.8.8 again -- fail. sshed a few hundred miles away to check 8.8.8.8 again -- working. tcpdump on the router, watched 8.8.8.8 fail to resolve in front of my eyes. About 4 minutes later, back to normal. "yes, sometimes the internet so-called gods fail."
I'm quite curious why you changed from an full authoritative setup to a proxying one. I've skimmed a handful of your past posts and agreed entirely, so we're both "right", or both wrong/broken-brained in the same way. ;-)
Is there something I could be doing to improve my already fantastic setup?
The big test will be when we hit peak load, perhaps on Monday or Tuesday.
It's a bit disconcerting how often I say "close to a decade now" now.
* Edit: we do rate-limit some accounts (https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...), which throttles how much they can post to the site, but we don't throttle how quickly they can view the site.
Annoyingly, in 2000-4, I was trying to get people to understand this and failing constantly because "it makes more sense if everything is the same - less to learn!" Hilariously*, I also got the blame when things broke even though none of them were my choice or design.
(Hell, even in 2020, I hit a similar issue with a single line Ruby CLI - lots of "everything else uses Python, why is it not Python?" moaning. Because the Python was a lot faffier and less readable!)
edit: to fix the formatting
> What does that mean? How do you access HN through CloudFlare ...
As written, it seems to suggest HN is in the Cloudflare cache. But, I don't think there's a way to access the cached version if a site's not down. I wasn't around during today's outage, so I can't speak to whether a generic Cloudflare cached version of HN was available during the downtime.
DNS, as I see it, lets someone else assign the names, i.e., the semantic meaning. Thus, assuming I am an internet user in the pre-DNS era, with the advent of DNS, I do not have to keep updating a HOSTS file when new hosts come online or change their address. This reduces administrative burden. The semantic meaning was already controllable pre-DNS, via the HOSTS file.
Many times I have read the criticisms of IP addresses as justifications for DNS. For example, IP addresses are (a) difficult to type or (b) difficult to remember. I simply cannot agree with such criticisms. As time goes on, and the www gets continually more nonsensically abstracted, I like IP addresses more and more.
Though... I've also read at least one "interesting" postmortem with complex operating scenarios and thought to myself (partially joking) that their failure isn't what they thought it was. The failure was having an unnecessarily complicated architecture in the first place, with too many abstractions and too much bloat. ;-) I would have "just" written it in C++ on Debian... ;-) (I'm exaggerating)
(I know that "my way" is fantastic.. until you need to scale across people)
We use rsync for log files.
I have never in my life heard anyone claim this as the reason for dns.
The usual reason given is two fold:
Flat /etc/hosts files were getting large enough to be annoying.
The set of all dns records as a whole change constantly. Individual records dont change very much. But the time between at least one record changing is very small.
Both of these things are even more true today then they were when dns was invented.
This is before i was born, but that sounds more like the reason why /etc/hosts was invented, which predates dns.
And thanks right back at you.
I hadn't noticed before your comment that while not in the customary way (I'm brown skinned and was born into a working class family) I've got TONS of "privilege" in other areas. :D
My life would probably be quite different if I didn't have active Debian and Linux kernel developers just randomly be the older friends helping me in my metaphorical "first steps" with Linux.
Looking back 20+ years ago, I lucked into an absurdly higher than average "floor" when I started getting serious about "computery stuff". Thanks for that. That's some genuine "life perspective" gift you just gave me. I'm smiling. :) I guess it really is hard to see your own privilege.
> 8 hours of downtime is not an unreasonable price to pay for this lesson. The opportunity cost of learning it beforehand would have been high as well.
100% agree.
I'd even say the opportunity cost would have been much higher. Additionally, 8hrs of downtime is still a great "score", depending on the size of the HN organization. (bad 'score' if it's >100 people. amazing 'score' if it's 1-5 people.)
Deploying source code is trivial these days. Large databases, not so much, unless you're already using something like RDS.
If that doesn't work, there's always the backup plan: say the magic words "scheduled maintenance", service $database stop, rsync it over, and bring it back up. The sky will not fall if HN goes down for another couple of hours, especially if it's scheduled ahead. :)
Do not let any user generated content be accessible from any hetzner IP or you are one email away from shutdown pretty much. Don't forget germany's laws on speech too; they are nothing remotely similar to US. I would host, for example, a corporate site just fine, but last thing ever would be a forum or image hosting site or w/e
Comment threads and comments each have a unique item number assigned monotonically.
The file system has a directory structure something like:
|—1000000
| |-100000
| |-200000
| |-…
| |-900000
|—2000000
| |-100000
| |-200000
| |-…
| |-900000
|-…
I imagine that the comment threads (like this one) while text are actually arc code (or a dialect of it) that is parsed into a continuation for each user to handle things like showdead, collapsed threads and hell bans.To go further out on a wobbly limb of out of my ass speculation, I suspect all the database credentialing is vanilla Unix user and group permissions because that is the simplest thing that might work and is at least as robust as any in-database credentialing system running on Unix would be.
Though simple direct file system IO is about as robust as reads and writes get since there’s no transaction semantics above the hardware layer, it is also worth considering that lost HN comments and stale reads don’t have a significant business impac
I mean HN being down didn’t result in millions of dollars per hour in lost revenue for YC…if it stayed offline for a month, there might be a significant impact to “goodwill” however.
Anyway, just WAGing.
[0] before the great rebuild I think all the files were just in one big directory and one day there were suddenly an impractical quantity and site performance fell over a cliff.
1. This goes back to 2008 and "DNS cache poisoning". Easiest way to avoid it was to not use shared caches.
2. I created a fat stub resolver^3 that stored all the addresses for TLD nameservers, i.e., what is in root.zone,^4 instead the binary. This reduces the number of queries for any lookup by one. I then used this program to resolve names without using recursion, i.e., using only authoritative servers and RD bit unset. Then I discovered patterns in the different permutations of lookups to resolve names, i.e., common DNS (mis)configurations. I found I could "brute force" lookups by trying the fastest permutations or most common ones first. I could beat the speed of a cache for names not already in the cache. I could beat the speed of 8.8.8.8 or a local cache for names not already in the cache.
3. Fat for the time. It is tiny compared to today's Go and Rust binaries.
4. Changes to root.zone were rare. Changes are probably more common today what with all the gTLDs but generally will always be relatively infrequent. Classic example of DNS data that is more static than dynamic.
I wrote more about data loss at https://news.ycombinator.com/item?id=32030407 in case that's of interest.
Eg, falls over due to steep traffic spikes caused by outages when autoscaling mechanisms get previously unseen levels of load increases and enter some yoyo oscillation pattern, whole AZ is overloaded because all the failovers from the other failing AZ triggering at once, hit circuit breakers, spin up too slowly to ever pass health checks etc. Or can't detect something becoming glacially slow but not outright failing.
See eg https://www.theverge.com/2021/12/22/22849780/amazon-aws-is-d... & https://www.theverge.com/2020/11/25/21719396/amazon-web-serv... etc (many more examples are out there)
Hmm, that actually makes me wonder about how big it would actually be. The nature of HN (not really storing a lot of images/videos like Reddit, for example) would probably lend itself well to being pretty economical in regards to the space used.
Assuming a link of 1 Gbps, ideally you'd be able to transfer close to 125 MB/s. So that'd mean that in 5 minutes you could transfer around 37'500 MB of data to another place, though you have to account for overhead. With compression in place, you might just be able to make this figure a lot better, though that depends on how you do things.
In practice the link speeds will vary (a lot) based on what hardware/hosting you're using, where and how you store any backups and what you use for transferring them elsewhere, if you can do that stuff incrementally then it's even better (scheduled backups of full data, incremental updates afterwards).
Regardless, in an ideal world where you have a lot of information, this would boil down to a mathematical equation, letting you plot how long bringing over all of the data would take for any given DB size (for your current infrastructure/setup). For many systems out there, 5 minutes would indeed be possible - but that becomes less likely the more data you store, or the more complicated components you introduce (e.g. separate storage for binary data, multiple services, message queues with persistence etc.).
That said, in regards to the whole container argument: I think that there are definitely benefits to be had from containerization, as long as you pick a suitable orchestrator (Kubernetes if you know it well from working in a lab setting or with supervision under someone else in a prod setting, or something simpler like Nomad/Swarm that you can prototype things quickly with).
bawolff is gonna keep trying.
I really do like static IPv4 addresses. I wish I owned one.
Do you also object to anycast?
For example, I ping 198.41.0.4. I choose to ping that address over all the others, e.g., www.google.com or whatever other people use. That is what I mean by user choice. I know the address is anycasted. Where the packets actually go is not something I get to choose. It would be neat to be able control that, e.g., if source routing actually worked on today's internet. But I have no such expectations.
How do Tor users know that an exit node IP address listed for a foreign country is not anycasted and the server is actually located somewhere else.
Maybe check against a list of anycast prefixes.
http://raw.githubusercontent.com/bgptools/anycast-prefixes/m...
drill news.ycombinator.com @108.162.192.195
Those are CF IP addresses. Before HN switched from M5 to AWS, CF was an alternative way to access HN. echo|openssl s_client -connect 50.112.136.166:443 -tls1_3In the past, I had a similar problem because of using hardware from the same batch. In retrospect, it's silly to be surprised they died at the same time.
_____
Related:
Appears “mthurman” is Mark Thurman, a software engineer at Y Combinator since 2016; HN profile has no obvious clues.
echo|bssl s_client -connect 50.112.136.166:443 -min-version tls1.3
Connecting to 50.112.136.166:443
Error while connecting: TLSV1_ALERT_PROTOCOL_VERSION
94922006718056:error:1000042e:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION:/home/bssl/boringssl-refs-heads-master/ssl/tls_record.cc:594:SSL alert number 70Version: 2.0.7 OpenSSL 1.1.1n 15 Mar 2022
Connected to 50.112.136.166
Testing SSL server news.ycombinator.com on port 443 using SNI name news.ycombinator.com
SSL/TLS Protocols:
SSLv2 disabled
SSLv3 disabled
TLSv1.0 enabled
TLSv1.1 enabled
TLSv1.2 enabled
TLSv1.3 disabledConsider the extreme case where your service is scattered over every AWS region: here an outage of any AWS region is guaranteed to take down your service.
Compare that to the case where your service is bound to only one region: then the odds of a single region outage taking down your entire service is reduced to 1 out of however many regions AWS has (assuming each region has an equal chance of suffering an outage).
To guard against outages, the failover service has to be scattered over entirely different regions (or, even better, on an entirely different service provider... which is probably a good idea anyway).
You were just lucky enough not to have been affected by AWS outages, but many others were.
You can get a lot of resilience to failure on AWS, but simply spinning up a dedicated EC2 instance is not nearly enough.
You can't just rsync files into a fully managed RDS PostgreSQL or Elasticsearch instance. You'll probably need to do a dump and restore, especially if the source machine has bad disks and/or has been running a different version. This will take much longer than simply copying the files.
Of course you could install the database of your choice in an EC2 box and rsync all you want, but that kinda defeats the purpose of using AWS and containerizing in the first place.
Ideally, all this data would have been already backed up to AWS (or your provider of choice) by the time your primary service failed, so all your have to do is spin up your backup server and your data would be waiting for you.
(Looks like HN does just this: https://news.ycombinator.com/item?id=32032316 )
This is why your systems should be designed by grizzled infrastructure veterans.
It looks like the data was already in S3: [1]
The recovery probably would have been a lot faster had they had a fully provisioned EC2 image standing by... which I'd bet they will from now on.
That is true, albeit not in all cases!
An alternative approach (that has some serious caveats) would be to do full backups of the DB directory, e.g. /var/lib/postgresql/data or /var/lib/mysql (as long as you can prevent invalid state data there) and then just starting up a container/instance with this directory mounted. Of course, that probably isn't possible with most if not all managed DB solutions out there.
Sure, though the solution where you back up the data probably won't be the same one where the new live DB will actually run, so some data transfer/IO will still be needed.
(sctb is Scott, former HN mod)
Agree. I think I should have suffixed a /s to my comment above.
> To guard against outages, the failover service has to be scattered over entirely different regions (or, even better, on an entirely different service provider... which is probably a good idea anyway).
Something, something... the greatest trick the devil (bigcloud) ever pulled...
Then more than one failing simultaneously isn't so inconceivable.
The S3 buckets where HN is backed up to could themselves be constantly copied to other S3 buckets which could be the buckets directly used by an EC2 instance, were it ever needed in case of emergency.
That would avoid on-demand data transfer from the backup S3 buckets themselves at the time of failure.
The backup S3 buckets could also be periodically copied to Glacier for long-term storage.
That's for an all-AWS backup solution. Of course you could do this with (for example) another datacenter and tapes, if you wanted to... or another cloud provider.
Variable names are usually idiomatic within a field/carry some semantics. e.g. k is angular wavenumber, omega is angular frequency. r is displacement. etc. They just use short names to prevent the name from distracting from the shape of the equations it's used in, so that it's easier to say things like "this behaves like a transport equation but with a source term that's proportional to the strength of the Foo field squared" or whatever.
Lots of phenomena have very similar governing equations, so downplaying the names of variables in favor of the structure/context they're used in allows for efficient transfer of intuition.
HN was running on Xeon(R) CPU E5-2637 v4 [1], that is SandyBridge era. A 2 Core CPU serving 6M request a day.
If iPhone had more memory the whole of HN could be served from it.
[1] https://www.intel.com/content/www/us/en/products/sku/64598/i...
Interesting feature comparisons: https://www.m5hosting.com/iaas-cloud/
so it's a private cloud, not m5 managed cloud environments across multi public cloud providers
Ever since I read someone's comment that 'HN is the fastest site they regularly visit', I've wondered if that's because they're in the western US (where HN is hosted).
I am in Western Europe and HN is decently fast, but tweakers.net and openstreetmap.org are faster.
Try Google too next time if HN’s search doesn’t find what you’re looking for:
https://www.google.com/search?q=site:news.ycombinator.com+Il...
https://docs.aws.amazon.com/AmazonCloudFront/latest/Develope...
However HN is not using CloudFront - so this doesn't matter for evaluating why HN is not supporting TLS1.3
There seem to have been multiple "full" outages in 2011-12 in AWS' us-east-1 region, which, granted, is the oldest AWS region and likely has a bunch of legacy stuff. By "full" outages I mean that a few core services fell over but the entire region become inaccessible due to those core failures.
Far easier to spin up a few large VMs on AWS for a few hours while you fix an issue than provision identical backup dedicated servers in a colo somewhere. And you can potentially just throw money at the issue while you fix the core service.
¯\_(ツ)_/¯
2 failures within a few hours is unlikely enough already though, unless there was a common variable (which there clearly was).
We tend to over engineer things as if it’s the end of the world to take a 10 minute outage… and end up causing longer ones because of the added complexity.
The context of the above statement was the HN site, not every site that uses AWS.
Specifically, I mean that if HN uses CF, then TLS1.3 will be supported. (Before the outage I accessd HN through CF so I could use TLS1.3, because the M5 hosted site did not support it.) Whereas if HN uses AWS, then TLS1.3 may or may not be supported. As it happens, there is no support.^1
Not being more clear is on me and I apologise that the statement was misinterpreted. Nevertheless, the fact that there are other sites accessed through AWS that support TLS1.3 does not help the HN user here who wants to use TLS1.3, namely, me. That is the context of the comment: accessing HN using TLS1.3. It is not a review of AWS. It is a statement about accessing HN with TLS1.3.
1. For example, those using Cloudfront CDN services.
Above are the specific claims that were called "BS". One has to do with enabling me to use the www without interruption if DNS stops working.^1 The other has to do with "experts" going into panic mode.^2
Neither claim relates to something being "meaningfully faster."
1. Because I use stored DNS data.
2. Because none of them advise anyone to store DNS data, let alone use it. They opt to promote and support a system that relies on DNS to work 100% of the time.
Im forseeing a full downtime in Frankfurt this winter tho. Germany is in really bad position when it comes to electricity.