zlacker

[return to "Cloudflare outage on December 5, 2025"]
1. uyzstv+6m[view] [source] 2025-12-05 17:03:50
>>meetpa+(OP)
What I'm missing here is a test environment. Gradual or not; why are they deploying straight to prod? At Cloudflare's scale, there should be a dedicated room in Cloudflare HQ with a full isolated model-scale deployment of their entire system. All changes should go there first, with tests run for every possible scenario.

Only after that do you use gradual deployment, with a big red oopsie button which immediately rolls the changes back. Languages with strong type systems won't save you, good procedure will.

◧◩
2. bombca+K81[view] [source] 2025-12-05 20:45:16
>>uyzstv+6m
They have millions of “free” subscribers; said subscribers should be the test pigs for rollouts; paying (read: big) subscribers can get the breaking changes later.
◧◩◪
3. bearde+kc1[view] [source] 2025-12-05 21:02:46
>>bombca+K81
This feels like such a valid solution and is how past $dayjobs released things: send to the free users, rollout to Paying Users once that's proven to not blow up.
◧◩◪◨
4. sznio+Yv1[view] [source] 2025-12-05 22:54:25
>>bearde+kc1
If your target is availability, that's correct.

If your target is security, then _assuming your patch is actually valid_ you're giving better security coverage for free customers than to your paying ones.

Cloudflare is both, and their tradeoffs seem to be set on maximizing security at cost of availability. And it makes sense. A fully unavailable system is perfectly secure.

[go to top]