But this is Scaling-101 stuff. It's not some super complex or unique system going wrong. At least according to the article, it's a classic case of bad retry logic leading to a death spiral.
Ironic that someone saying it's scaling 101 follows up the comment with a completely wrong explanation.
In my mind, it is much closer to needlessly asking every server for the same information because the requests are most likely load balanced, but I guess it's true that I don't know the load balancing strategy. Even still, is it not more likely than not that those retries are hitting multiple servers?
This specific problem we're discussing, of concurrent client retries effectively launching a self-imposed DDOS attack, isn't exactly the thundering herd problem. It's clients and servers instead of threads, for one thing. But it's a good enough analogy to another type of cascading failure in concurrent computing, IMO.