zlacker

I’m not sure I share this sentiment.

First, let’s set aside the separate question of whether monopolies are bad. They are not good but that’s not the issue here.

As to architecture:

Cloudflare has had some outages recently. However, what’s their uptime over the longer term? If an individual site took on the infra challenges themselves, would they achieve better? I don’t think so.

But there’s a more interesting argument in favour of the status quo.

Assuming cloudflare’s uptime is above average, outages affecting everything at once is actually better for the average internet user.

It might not be intuitive but think about it.

How many Internet services does someone depend on to accomplish something such as their work over a given hour? Maybe 10 directly, and another 100 indirectly? (Make up your own answer, but it’s probably quite a few).

If everything goes offline for one hour per year at the same time, then a person is blocked and unproductive for an hour per year.

On the other hand, if each service experiences the same hour per year of downtime but at different times, then the person is likely to be blocked for closer to 100 hours per year.

It’s not really bad end user experience that every service uses cloudflare. It’s more-so a question of why is cloudflare’s stability seeming to go downhill?

And that’s a fair question. Because if their reliability is below average, then the value prop evaporates.

replies(18): >>gerdes+U6 >>embedd+Z9 >>Nextgr+mb >>ccakes+pc >>randme+kg >>kjgkjh+Sj >>fallou+pm >>wat100+Dt >>dfex+Dx >>atmosx+OK >>geyser+nL >>sunrun+OL >>nialse+7N >>tonyhb+4O >>hector+MP >>lxgr+fW1 >>clicke+EY1 >>chamom+C12

>>tobyjs+(OP)
All of my company's hosted web sites have way better uptimes and availability than CF but we are utterly tiny in comparison.

With only some mild blushing, you could describe us as "artisanal" compared to the industrial monstrosities, such as Cloudflare.

Time and time again we get these sorts of issues with the massive cloudy chonks and they are largely due to the sort of tribalism that used to be enshrined in the phrase: "no one ever got fired for buying IBM".

We see the dash to the cloud and the shoddy state of in house corporate IT as a result. "We don't need in-house knowledge, we have "MS copilot 365 office thing" that looks after itself and now its intelligent - yay \o/

Until I can't, I'm keeping it as artisanal as I can for me and my customers.

replies(1): >>foobar+du1

>>tobyjs+(OP)
> Cloudflare has had some outages recently. However, what’s their uptime over the longer term? If an individual site took on the infra challenges themselves, would they achieve better? I don’t think so.

Why is that the only option? Cloudflare could offer solutions that let people run their software themselves, after paying some license fee. Or there could be many companies people use instead, instead of everyone flocking to one because of cargoculting "You need a CDN like Cloudflare before you launch your startup bro".

replies(2): >>Moto74+gb >>tobyjs+Ab

>>embedd+Z9
What you’re suggesting is not trivial. Otherwise we wouldn’t use various CDNs. To do what Cloudflare does your starting point is “be multiple region/multiple cloud from launch” which is non-trivial especially when you’re finding product-market fit. A better poor man’s CDN is object storage through your cloud of choice serving HTTP traffic. Cloudflare also offers layers of security and other creature comforts. Ignoring the extras they offer, if you build what they offer you have effectively made a startup within a startup.

Cloudflare isn’t the only game in town either. Akamai, Google, AWS, etc all have good solutions. I’ve used all of these at jobs I’ve worked at and the only poor choice has been to not use one at all.

>>tobyjs+(OP)
> If an individual site took on the infra challenges themselves, would they achieve better? I don’t think so.

I disagree; most people need only a subset of Cloudflare's features. Operating just that subset avoids the risk of the other moving parts (that you don't need anyway) ruining your day.

Cloudflare is also a business and has its own priorities like releasing new features; this is detrimental to you because you won't benefit from said feature if you don't need it, yet still incur the risk of the deployment going wrong like we saw today. Operating your own stack would minimize such changes and allow you to schedule them to a maintenance window to limit the impact should it go wrong.

The only feature Cloudflare (or its competitors) offers that can't be done cost-effectively yourself is volumetric DDoS protection where an attacker just fills your pipe with junk traffic - there's no way out of this beyond just having a bigger pipe, which isn't reasonable for any business short of an ISP or infrastructure provider.

replies(1): >>Araina+Ki

>>embedd+Z9
What do you think Cloudflare’s core business is? Because I think it’s two things:

1. DDoS protection

2. Plug n’ Play DNS and TLS (termination)

Neither of those make sense for self-hosted.

Edit: If it’s unclear, #2 doesn’t make sense because if you self-host, it’s no longer plug n’ play. The existing alternatives already serve that case equally well (even better!).

replies(1): >>stingr+zc

>>tobyjs+(OP)
> If an individual site took on the infra challenges themselves, would they achieve better? I don’t think so.

The point is that it doesn’t matter. A single site going down has a very small chance of impacting a large number of users. Cloudflare going down breaks an appreciable portion of the internet.

If Jim’s Big Blog only maintains 95% uptime, most people won’t care. If BofA were at 95%.. actually same. Most of the world aren’t BofA customers.

If Cloudflare is at 99.95% then the world suffers

replies(5): >>sherma+4r >>chii+mu >>johnco+bz >>rainco+SK >>esrauc+G81

>>tobyjs+Ab
Cloudflare Zero-Trust is also very core to their enterprise business.

>>tobyjs+(OP)
> If an individual site took on the infra challenges themselves, would they achieve better? I don’t think so.

I’m tired of this sentiment. Imagine if people said, why develop your own cloud offering? Can you really do better than VMWare..?

Innovation in technology has only happened because people dared to do better, rather than giving up before they started…

>>Nextgr+mb
>The only feature Cloudflare (or its competitors) offers that can't be done cost-effectively yourself is volumetric DDoS protection

.... And thanks to AI everyone needs that all the time now since putting a site on the Internet means an eternal DDoS attack.

>>tobyjs+(OP)
That's an interesting point, but in many (most?) cases productivity doesn't depend on all services being available at the same time. If one service goes down, you can usually be productive by using an alternative (e.g. if HN is down you go to Reddit, if email isn't working you catch up with Slack).

replies(2): >>sema4h+Qo >>tobyjs+Pt

>>tobyjs+(OP)
"My architecture depends upon a single point of failure" is a great way to get laughed out of a design meeting. Outsourcing that single point of failure doesn't cure my design of that flaw, especially when that architecture's intended use-case is to provide redundancy and fault-tolerance.

The problem with pursuing efficiency as the primary value prop is that you will necessarily end up with a brittle result.

replies(1): >>lockni+NL

>>kjgkjh+Sj
If HN, Reddit, email, Slack and everything else is down for a day, I think my productivity would actually go up, not down.

replies(1): >>zqna+LD

>>ccakes+pc
Maybe worlds can just live without the internet for a few hours.

There are likely emergency services dependent on Cloudflare at this point, so I’m only semi serious.

replies(2): >>lockni+XK >>p-e-w+de1

>>tobyjs+(OP)
That’s fine if it’s just some random office workers. What if every airline goes down at the same time because they all rely on the same backend providers? What if every power generator shuts off? “Everything goes down simultaneously” is not, in general, something to aim for.

replies(1): >>tazjin+Ca1

>>kjgkjh+Sj
Many (I’d speculate most) workflows involve moving and referencing data across multiple applications. For example, read from a spreadsheet while writing a notion page, then send a link in Slack. If any one app is down, the task is blocked.

Software development is a rare exception to this. We’re often writing from scratch (same with designers, and some other creatives). But these are definitely the exception compared to the broader workforce.

Same concept applies for any app that’s built on top of multiple third-party vendors (increasingly common for critical dependencies of SaaS)

>>ccakes+pc
> If Cloudflare is at 99.95% then the world suffers

if the world suffers, those doing the "suffering" needs to push that complaint/cost back up the chain - to the website operator, which would push the complaint/cost up to cloudflare.

The fact that nobody did - or just verbally complained without action - is evidence that they didn't really suffer.

In the mean time, BofA saved cost in making their site 99.95% uptime themselves (presumably cloudflare does it cheaper than they could individually). So the entire system became more efficient as a result.

replies(2): >>yfw+Zw >>lockni+jL

>>chii+mu
They didnt really suffer or they dont have choice?

>>tobyjs+(OP)
> If everything goes offline for one hour per year at the same time, then a person is blocked and unproductive for an hour per year. > On the other hand, if each service experiences the same hour per year of downtime but at different times, then the person is likely to be blocked for closer to 100 hours per year.

Putting Cloudflare in front of a site doesn't mean that site's backend suddenly never goes down. Availability will now be worse - you'll have Cloudflare outages* affecting all the sites they proxy for, along with individual site back-end failures which will of course still happen.

* which are still pretty rare

>>ccakes+pc
Look at it a user (or even operator) of one individual service that isn’t redundant or safety critical: if choice A has 1/2 the downtime of choice B, you can’t justify choosing choice B by virtue of choice A’s instability.

replies(1): >>moqmar+TD

>>sema4h+Qo
During 1st Cloudflare outage StackOverflow was down too.

>>johnco+bz
That is exactly why you don't see Windows being used anymore in big corporations. /s

>>tobyjs+(OP)
CloudFlare doesn’t have a good track record. It’s the third party that caused more outages for us than any other third party service in the last four years.

>>ccakes+pc
> A single site going down has a very small chance of impacting a large number of users

How? If Github is down how many people are affected? Google?

> Jim’s Big Blog only maintains 95% uptime, most people won’t care

Yeah, and in the world with Cloudflare people don't care if Jim's Blog is down either. So Cloudflare doesn't make things worse.

replies(1): >>dns_sn+6N

>>sherma+4r
> Maybe worlds can just live without the internet for a few hours.

The world can also live a few hours without sewers, water supply, food, cars, air travel, etc.

But "can" and "should" are different words.

>>chii+mu
> The fact that nobody did - or just verbally complained without action - is evidence that they didn't really suffer.

What an utterly clueless claim. You're literally posting in a thread with nearly 500 posts of people complaining. Taking action takes time. A business just doesn't switch cloud providers overnight.

I can tell you in no uncertain terms that there are businesses impacted by Cloudflare's frequent outages that started work shedding their dependency on Cloudflare's services. And it's not just because of these outages.

>>tobyjs+(OP)
On the other hand, if one site is down you might have alternatives. Or, you can do something different until the site you needed is up again. Your argument that simultaneous downtime is more efficient than uncoordinated downtime because tasks usually rely on multiple sites being online simultaneously is an interesting one. Whether or not that's true is an empirical question, but I lean toward thinking it's not true. Things failing simultaneously tends to have worse consequences.

>>fallou+pm
> "My architecture depends upon a single point of failure" is a great way to get laughed out of a design meeting.

This is a simplistic opinion. Claiming services like Cloudflare are modeled as single points of failure is like complaining that your use of electricity to power servers is a single point of failure. Cloudflare sells a global network of highly reliable edge servers running services like caching, firewall, image processing, etc. And more importas a global firewall that protects services against global distributed attacks. Until a couple of months ago, it was unthinkable to casual observers that Cloudflare was such an utter unreliable mess.

replies(2): >>fallou+iv1 >>kortil+mH1

>>tobyjs+(OP)
> If everything goes offline for one hour per year at the same time, then a person is blocked and unproductive for an hour per year.

This doesn’t guarantee availability of those N services themselves though, surely? N services with a slightly lower availability target than N+1 with a slightly higher value?

More importantly, I’d say that this only works for non-critical infrastructure, and also assumes that the cost of bringing that same infrastructure back is constant or at least linear or less.

The 2025 Iberian Peninsula outage seems to show that’s not always the case.

>>rainco+SK
Terrible examples, Github and Google aren't just websites that one would place behind Cloudflare to try to improve their uptime (by caching, reducing load on the origin server, shielding from ddos attacks). They're their own big tech companies running complex services at a scale comparable to Cloudflare.

>>tobyjs+(OP)
Paraphrasing: We are setting aside the actual issue and looking for a different angle.

To me this reads as a form of misdirection, intentional or not. A monopolist has little reason to care about downstream effects, since customers have nowhere else to turn. Framing this as roll your own versus Cloudflare rather than as a monoculture CDN environment versus a diverse CDN ecosystem feels off.

That said, the core problem is not the monopoly itself but its enablers, the collective impulse to align with whatever the group is already doing, the desire to belong and appear to act the "right way", meaning in the way everyone else behaves. There are a gazillion ways of doing CDN, why are we not doing them? Why the focus on one single dominant player?

replies(1): >>citize+YP

>>tobyjs+(OP)
Cloudbleed. It’s been a fun time.

>>tobyjs+(OP)
> On the other hand, if each service experiences the same hour per year of downtime but at different times, then the person is likely to be blocked for closer to 100 hours per year.

I think the parent post made a different argument:

- Centralizing most of the dependency on Cloudflare results in a major outage when something happens at Cloudflare, it is fragile because Cloudflare becomes the single point of failure. Like: Oh Cloudflare is down... oh, none of my SaaS services work anymore.

- In a world where this is not the case, we might see more outages, but they would be smaller and more contained. Like: oh, Figma is down? fine, let me pickup another task and come back to Figma once it's back up. It's also easier to work around by having alternative providers as a fallback, as they are less likely to share the same failure point.

As a result, I don't think you'll be blocked 100 hours a year in scenario 2. You may observe 100 non-blocking inconveniences per year, vs a completely blocking Cloudflare outage.

And in observed uptime, I'm not even sure these providers ever won. We're running all our auxiliary services on a decent Hetzner box with a LB. Say what you want, but that uptime is looking pretty good compared to any services relying on AWS (Oct 20, 15 hours), Cloudflare (Dec 5 (half hour), Nov 18 (3 hours)). Easier to reason about as well. Our clients are much more forgiving when we go down due to Azure/GCP/AWS/Cloudflare vs our own setup though...

>>nialse+7N
> Why the focus on one single dominant player?

I don’t the answer to the all questions. But here I think it is just a way to avoid responsibility. If someone choses CDN “number 3” and it goes down, business people *might* put a blame on this person for not choosing “the best”. I am not saying it is a right approach, I just seen it happens too many times.

replies(1): >>nialse+U02

>>ccakes+pc
I'm not sure I follow the argument. If literally every individual site had an uncorrelated 99% uptime, that's still less available than a centralized 99.9% uptime. The "entire Internet" is much less available in the former setup.

It's like saying that Chipotle having X% chance of tainted food is worse than local burrito places having 2*X% chance of tainted food. It's true in the lens that each individual event affects more people, but if you removed that Chipotle and replaced with all local, the total amount of illness is still strictly higher, it's just tons of small events that are harder to write news articles about.

replies(2): >>psycho+Ye1 >>Akrony+cx1

>>wat100+Dt
That is literally how a large fraction of airlines work. It's called Amadeus, and it did have a big global outage not too long ago.

replies(1): >>wat100+0m1

>>sherma+4r
The world dismantled landlines, phone booths, mail order catalogues, fax machines, tens of millions of storefronts, government offices, and entire industries in favor of the Internet.

So at this point no, the world can most definitely not “just live without the Internet”. And emergency services aren’t the only important thing that exists to the extent that anything else can just be handwaved away.

replies(1): >>171862+G12

>>esrauc+G81
No it's like saying if one single point of failure in a global food supply chain fails, nobody's going to eat today. And which is in contrast to if some supplier fails to provide a local food truck today their customers will have to go to the restaurant next door.

replies(1): >>esrauc+6j1

>>psycho+Ye1
Ah ok, it is true that if there's a lot of fungible offerings that worse but uncorrelated uptime can be more robust.

I think the question then is how much of the Internet has fungible alternatives such that uncorrelated downtime can meaningfully be less impact. If you have a "to buy" shopping list, the existence of alternative shopping list products doesn't help you, when the one you use is down it's just down, the substitutes cannot substitute on short notice. Obviously for some things there's clear substitutes though, but actually I think "has fungible alternatives" is mostly correlated with "being down for 30 minutes doesn't matter", it seems that the things where you want the one specific site are the ones where availability matters more.

replies(1): >>hunter+fr1

>>tazjin+Ca1
Which should be a good example of why this should be avoided.

>>esrauc+6j1
The restaurant-next-door analogy, representing fungibility, isn't quite right. If BofA is closed and you want to do something in person with them, you can't go to an unrelated bank. If Spotify goes down for an hour, you're not likely to become a YT Music subscriber as a stopgap even though they're somewhat fungible. You'll simply wait, and the question is: can I shuffle my schedule instead of elongating it?

A better analogy is that if the restaurant you'll be going to is unexpectedly closed for a little while, you would do an after-dinner errand before dinner instead and then visit the restaurant a bit later. If the problem affects both businesses (like a utility power outage) you're stuck, but you can simply rearrange your schedule if problems are local and uncorrelated.

replies(1): >>psycho+Xx1

>>gerdes+U6
Sorry for the downvotes but this is true many times with some basic HA you get better uptime than the big cloud boys, yes their stack and tech is fancier but we also need to factor in how much CF messes with it vs self hosted, anyway the self hosted wisdom is RIP these days and I mostly just run cf pages / kv :)

>>lockni+NL
Your electricity to servers IS a single point of failure, if all you do is depend upon the power company to reliably feed power. There is a reason that co-location centers have UPS and generator backups for power.

It may have been unthinkable to some casual observers that creating a giant single point of failure for the internet was a bad idea but it was entirely thinkable to others.

>>esrauc+G81
Also what about individual sites having 99% uptime while behind CF with an uncorrelated uptime of 99.9%?

Just because CF is up doesnt mean the site is

>>hunter+fr1
If utility power outage is put on the table, then the analogy is almost everyone solely relying on the same grid, in contrast with being wired to a large set of independent providers or even using their own local solar panel or whatever autonomous energy source.

>>lockni+NL
You do know that data centers use backup generators because electricity is a single point of failure right? They even have multiple power supplies plugged into different circuits.

>>tobyjs+(OP)
> If everything goes offline for one hour per year at the same time, then a person is blocked and unproductive for an hour per year.

The consequence of some services being offline is much, much worse than a person (or a billion) being bored in front of a screen.

Sure, it’s arguably not Cloudflares fault that these services are cloud-dependent in the first place, but even if service just degrades somewhat gracefully in an ideal case, that’s a lot of global clustering of a lot of exceptional system behavior.

Or another analogy: Every person probably passes out for a few minutes in their live at one point or another. Yet I wouldn’t want to imagine what happens if everybody got that over with at the very same time without warning…

>>tobyjs+(OP)
If you’re using 10 services and 1 goes down, there’s a 9/10 chance you’re not using it and you can switch to work on something else. If all 10 go down you are actually blocked for an hour. Even 5 years ago, I can’t recall ever being actually impacted by an outtage to the extent that I was like “well, might as well just go get something to eat because everything is down”.

>>citize+YP
True. Nobody ever got fired for choosing IBM/Microsoft/Oracle/Cisco/etc. Likely an effect of stakeholder (executives/MBAs) brand recognition.

>>tobyjs+(OP)
When I’m working from home and the internet goes down, I don’t care. My poor private-equity owned corporation, think of the lost productivity!!

But if I was trying to buy insulin at 11 pm before benefits expire, or translate something at a busy train station in a foreign country, or submit my take-home exam, I would be freeeaaaking out.

The cloudflare-supported internet does a whole lot of important, time-critical stuff.

>>p-e-w+de1
In my opinion, the world actually should be able to live without the internet, but that's another matter.

replies(1): >>sherma+172

>>171862+G12
That’s what I was getting at. There’s a lot of life that can be lived offline.