zlacker

[return to "Cloudflare outage on December 5, 2025"]
1. xnorsw+33[view] [source] 2025-12-05 15:47:43
>>meetpa+(OP)
My understanding, paraphrased: "In order to gradually roll out one change, we had to globally push a different configuration change, which broke everything at once".

But a more important takeaway:

> This type of code error is prevented by languages with strong type systems

◧◩
2. jsnell+w4[view] [source] 2025-12-05 15:53:12
>>xnorsw+33
That's a bizarre takeaway for them to suggest, when they had exactly the same kind of bug with Rust like three weeks ago. (In both cases they had code implicitly expecting results to be available. When the results weren't available, they terminated processing of the request with an exception-like mechanism. And then they had the upstream services fail closed, despite the failing requests being to optional sidecars rather than on the critical query path.)
◧◩◪
3. little+C7[view] [source] 2025-12-05 16:04:33
>>jsnell+w4
In fairness, the previous bug (with the Rust unwrap) should never have happened: someone explicitly called the panicking function, the review didn't catch it and the CI didn't catch it.

It required a significant organizational failure to happen. These happen but they ought to be rarer than your average bug (unless your organization is fundamentally malfunctioning, that is)

[go to top]