zlacker

[return to "Cloudflare outage on December 5, 2025"]
1. mixedb+Xv1[view] [source] 2025-12-05 22:54:13
>>meetpa+(OP)
This is architectural problem, the LUA bug, the longer global outage last week, a long list of earlier such outages only uncover the problem with architecture underneath. The original, distributed, decentralized web architecture with heterogeneous endpoints managed by myriad of organisations is much more resistant to this kind of global outages. Homogeneous systems like Cloudflare will continue to cause global outages. Rust won't help, people will always make mistakes, also in Rust. Robust architecture addresses this by not allowing a single mistake to bring down myriad of unrelated services at once.
◧◩
2. WD-42+qw1[view] [source] 2025-12-05 22:57:38
>>mixedb+Xv1
In other words, the consolidation on Cloudflare and AWS makes the web less stable. I agree.
◧◩◪
3. amazin+5z1[view] [source] 2025-12-05 23:16:11
>>WD-42+qw1
Usually I am allergic to pithy, vaguely dogmatic summaries like this but you're right. We have traded "some sites are down some of the time" for "most sites are down some of the time". Sure the "some" is eliding an order of magnitude or two, but this framing remains directionally correct.
◧◩◪◨
4. PullJo+VA1[view] [source] 2025-12-05 23:27:05
>>amazin+5z1
Does relying on larger players result in better overall uptime for smaller players? AWS is providing me better uptime than if I assembled something myself because I am less resourced and less talented than that massive team.

If so, is it a good or bad trade to have more overall uptime but when things go down it all goes down together?

◧◩◪◨⬒
5. Vorpal+iF1[view] [source] 2025-12-06 00:03:12
>>PullJo+VA1
From a societal view it is worse when everything is down at once. Leads to a less resilient society: It is not great if I can't buy essentials from one store because their payment system is down (this happened to one super market chain in Sweden due to a hacker attack some years ago, took weeks to fully fix everything, and then there was that whole Crowdstrike debacle globally more recently).

It is far worse if all of the competitors are down at once. To some extent you can and should have a little bit of stock at home (water, food, medicine, ways to stay warm, etc) but not everything is practical to do so with (gasoline for example, which could have knock on effects on delivery of other goods).

◧◩◪◨⬒⬓
6. pas+8K2[view] [source] 2025-12-06 13:27:12
>>Vorpal+iF1
it's not that simple, no?

users want to do things, if their goal depends on a complex chain of functions (provided by various semi-independent services) then the ideal setup would be to have redundant providers and users could simply "load balance" between them and that separate high-level providers' uptime state is clustered (meaning that when Google is unavailable Bing is up, and when Random Site A, goes down their payment provider goes down too, etc..)

So ideally sites would somehow sort themselves nearly to separate availability groups.

Otherwise simply having a lot of uncorrelated downtimes doesn't help (if we count the sum of downtime experienced by people). Though again it gets complicated by the downtime percentage, because likely there's a phase shift between the states when user can mostly complete their goals and when they cannot because too many cascading failures.

[go to top]