zlacker

[return to "Cloudflare outage on December 5, 2025"]
1. mixedb+Xv1[view] [source] 2025-12-05 22:54:13
>>meetpa+(OP)
This is architectural problem, the LUA bug, the longer global outage last week, a long list of earlier such outages only uncover the problem with architecture underneath. The original, distributed, decentralized web architecture with heterogeneous endpoints managed by myriad of organisations is much more resistant to this kind of global outages. Homogeneous systems like Cloudflare will continue to cause global outages. Rust won't help, people will always make mistakes, also in Rust. Robust architecture addresses this by not allowing a single mistake to bring down myriad of unrelated services at once.
◧◩
2. chicke+SA1[view] [source] 2025-12-05 23:26:59
>>mixedb+Xv1
You're not wrong, but where's the robust architecture you're referring to? The reality of providing reliable services on the internet is far beyond the capabilities of most organizations.
◧◩◪
3. coderj+dT2[view] [source] 2025-12-06 14:53:17
>>chicke+SA1
I think it might be a organizational architecture that needs to change.

> However, we have never before applied a killswitch to a rule with an action of “execute”.

> This is a straightforward error in the code, which had existed undetected for many years

So they shipped an untested configuration change that triggered untested code straight to production. This is "tell me you have no tests without telling me you have no tests" level of facepalm. I work on safety-critical software where if we had this type of quality escape both internal auditors and external regulators would be breathing down our necks wondering how our engineering process failed and let this through. They need to rearchitect their org to put greater emphasis on verification and software quality assurance.

[go to top]