zlacker

[return to "Cloudflare outage on December 5, 2025"]
1. w10-1+aw[view] [source] 2025-12-05 17:47:25
>>meetpa+(OP)
Kudos to Cloudflare for clarity and diligence.

When talking of their earlier Lua code:

> we have never before applied a killswitch to a rule with an action of “execute”.

I was surprised that a rules-based system was not tested completely, perhaps because the Lua code is legacy relative to the newer Rust implementation?

It tracks what I've seen elsewhere: quality engineering can't keep up with the production engineering. It's just that I think of CloudFlare as an infrastructure place, where that shouldn't be true.

I had a manager who came from defense electronics in the 1980's. He said in that context, the quality engineering team was always in charge, and always more skilled. For him, software is backwards.

◧◩
2. ifwint+Gi2[view] [source] 2025-12-06 07:57:57
>>w10-1+aw
It's weird reading these reports because they don't seem to test anything at all (or at least there's very little mention of testing).

Canary deployment, testing environments, unit tests, integration tests, anything really?

It sounds like they test by merging directly to production but surely they don't

◧◩◪
3. Dumble+sp2[view] [source] 2025-12-06 09:29:19
>>ifwint+Gi2
In the post they described that they observed errors happening in their testing env, but decided to ignore because they were rolling out a security fix. I am sure there is more nuance to this, but I don’t know whether that makes it better or worse
◧◩◪◨
4. misswa+sN2[view] [source] 2025-12-06 13:58:38
>>Dumble+sp2
> but decided to ignore because they were rolling out a security fix.

A key part of secure systems is availability...

It really looks like vibe-coding.

[go to top]