zlacker

[return to "Cloudflare outage on December 5, 2025"]
1. paradi+q5[view] [source] 2025-12-05 15:56:37
>>meetpa+(OP)
The deployment pattern from Cloudflare looks insane to me.

I've worked at one of the top fintech firms, whenever we do a config change or deployment, we are supposed to have rollback plan ready and monitor key dashboards for 15-30 minutes.

The dashboards need to be prepared beforehand on systems and key business metrics that would be affected by the deployment and reviewed by teammates.

I've never seen a downtime longer than 1 minute while I was there, because you get a spike on the dashboard immediately when something goes wrong.

For the entire system to be down for 10+ minutes due to a bad config change or deployment is just beyond me.

◧◩
2. vlovic+Tw[view] [source] 2025-12-05 17:50:22
>>paradi+q5
That is also true at Cloudflare for what it’s worth. However, the company is so big that there’s so many different products all shipping at the same time it can be hard to correlate it to your release, especially since there’s a 5 min lag (if I recall correctly) in the monitoring dashboards to get all the telemetry from thousands of servers worldwide.

Comparing the difficulty of running the world’s internet traffic with hundreds of customer products with your fintech experience is like saying “I can lift 10 pounds. I don’t know why these guys are struggling to lift 500 pounds”.

◧◩◪
3. autoex+XK[view] [source] 2025-12-05 18:52:17
>>vlovic+Tw
> However, the company is so big that there’s so many different products all shipping at the same time it can be hard to correlate it to your release

This kind of thing would be more understandable for a company without hundreds of billions of dollars, and for one that hasn't centralized so much of the internet. If a company has grown too large and complex to be well managed and effective and it's starting to look like a liability for large numbers of people there are obvious solutions for that.

◧◩◪◨
4. vlovic+xM[view] [source] 2025-12-05 18:58:22
>>autoex+XK
Can you name a major cloud provider that doesn’t have major outages?

If this were purely a money problem it would have been solved ages ago. It’s a difficult problem to solve. Also, they’re the youngest of the major cloud providers and have a fraction of the resources that Google, Amazon, and Microsoft have.

◧◩◪◨⬒
5. autoex+JO[view] [source] 2025-12-05 19:08:05
>>vlovic+xM
> Can you name a major cloud provider that doesn’t have major outages?

That fact that no major cloud provider is actually good is not an argument that cloudflare isn't bad, or even that they couldn't/shouldn't do better than they are. They have fewer resources than Google or Microsoft but they're also in a unique position that makes us differently vulnerable when they fuck up. It's not all their fault, since it was a mistake to centralize the internet to the extent that we have in the first place, but now that they are responsible for so much they have to expect that people will be upset when they fail.

◧◩◪◨⬒⬓
6. vlovic+Qm2[view] [source] 2025-12-06 08:55:10
>>autoex+JO
Every major cloud provider (including Cloudflare) is orders of magnitude better at keeping 9s of availability worldwide for thousands of customers than those customers are individually. The very best of those customers might be better and only rely on cloud providers for the scaling or huge amounts of infrastructure they don’t otherwise want to own, but the vast majority are actually less capable at accomplishing whatever uptime the providers already get.

Could cloudflare do better? Sure, that’s a truism for everyone. Did they make mistakes and continue to make mistakes? Also a truism.

Trust me, they are acutely aware of people getting upset when they fail. Why do you think they’re CEO and CTO are writing these blog posts?

[go to top]