zlacker

[return to "Cloudflare outage on December 5, 2025"]
1. mixedb+Xv1[view] [source] 2025-12-05 22:54:13
>>meetpa+(OP)
This is architectural problem, the LUA bug, the longer global outage last week, a long list of earlier such outages only uncover the problem with architecture underneath. The original, distributed, decentralized web architecture with heterogeneous endpoints managed by myriad of organisations is much more resistant to this kind of global outages. Homogeneous systems like Cloudflare will continue to cause global outages. Rust won't help, people will always make mistakes, also in Rust. Robust architecture addresses this by not allowing a single mistake to bring down myriad of unrelated services at once.
◧◩
2. delusi+TE2[view] [source] 2025-12-06 12:42:25
>>mixedb+Xv1
What you've identified here is a core part of what the banking sector calls the "risk based approach". Risk in that case is defined as the product of the chance of something happening and the impact of it happening. With this understanding we can make the same argument you're making, a little more clearly.

Cloudflare is really good at what they do, they employ good engineering talent, and they understand the problem. That lowers the chance of anything bad happening. On the other hand, they achieve that by unifying the infrastructure for a large part of the internet, raising the impact.

The website operator herself might be worse at implementing and maintaining the system, which would raise the chance of an outage. Conversely, it would also only affect her website, lowering the impact.

I don't think there's anything to dispute in that description. The discussion then is if cloudflares good engineering lowers the chance of an outage happening more than it raises the impact. In other words, the things we can disagree about is the scaling factors, the core of the argument seems reasonable to me.

[go to top]