Cloudflare outage on December 5, 2025

>>meetpa+(OP)
This is architectural problem, the LUA bug, the longer global outage last week, a long list of earlier such outages only uncover the problem with architecture underneath. The original, distributed, decentralized web architecture with heterogeneous endpoints managed by myriad of organisations is much more resistant to this kind of global outages. Homogeneous systems like Cloudflare will continue to cause global outages. Rust won't help, people will always make mistakes, also in Rust. Robust architecture addresses this by not allowing a single mistake to bring down myriad of unrelated services at once.

>>mixedb+Xv1
I’m not sure I share this sentiment.

First, let’s set aside the separate question of whether monopolies are bad. They are not good but that’s not the issue here.

As to architecture:

Cloudflare has had some outages recently. However, what’s their uptime over the longer term? If an individual site took on the infra challenges themselves, would they achieve better? I don’t think so.

But there’s a more interesting argument in favour of the status quo.

Assuming cloudflare’s uptime is above average, outages affecting everything at once is actually better for the average internet user.

It might not be intuitive but think about it.

How many Internet services does someone depend on to accomplish something such as their work over a given hour? Maybe 10 directly, and another 100 indirectly? (Make up your own answer, but it’s probably quite a few).

If everything goes offline for one hour per year at the same time, then a person is blocked and unproductive for an hour per year.

On the other hand, if each service experiences the same hour per year of downtime but at different times, then the person is likely to be blocked for closer to 100 hours per year.

It’s not really bad end user experience that every service uses cloudflare. It’s more-so a question of why is cloudflare’s stability seeming to go downhill?

And that’s a fair question. Because if their reliability is below average, then the value prop evaporates.

>>tobyjs+KD1
That's an interesting point, but in many (most?) cases productivity doesn't depend on all services being available at the same time. If one service goes down, you can usually be productive by using an alternative (e.g. if HN is down you go to Reddit, if email isn't working you catch up with Slack).

>>kjgkjh+CX1
Many (I’d speculate most) workflows involve moving and referencing data across multiple applications. For example, read from a spreadsheet while writing a notion page, then send a link in Slack. If any one app is down, the task is blocked.

Software development is a rare exception to this. We’re often writing from scratch (same with designers, and some other creatives). But these are definitely the exception compared to the broader workforce.

Same concept applies for any app that’s built on top of multiple third-party vendors (increasingly common for critical dependencies of SaaS)

zlacker