zlacker

[parent] [thread] 33 comments
1. cpncru+(OP)[view] [source] 2025-12-05 17:14:59
I've noticed that in recent months, even apart from these outages, cloudflare has been contributing to a general degradation and shittification of the internet. I'm seeing a lot more "prove you're human", "checking to make sure you're human", and there is normally at the very least a delay of a few seconds before the site loads.

I don't think this is really helping the site owners. I suspect it's mainly about AI extortion:

https://blog.cloudflare.com/introducing-pay-per-crawl/

replies(8): >>NooneA+82 >>james2+L2 >>gblarg+yi1 >>_kidli+3T1 >>pmdr+4W1 >>stef25+kJ2 >>bobbob+sg3 >>chamom+Mg3
2. NooneA+82[view] [source] 2025-12-05 17:23:17
>>cpncru+(OP)
it can't even spy on us silently, damn
3. james2+L2[view] [source] 2025-12-05 17:26:40
>>cpncru+(OP)
You call it extortion of the AI companies, but isn’t stealing/crawling/hammering a site to scrape their content to resell just as nefarious? I would say Cloudflare is giving these site owners an option to protect their content and as a byproduct, reduce their own costs of subsidizing their thieves. They can choose to turn off the crawl protection. If they aren't, that tells you that they want it, doesn’t it?
replies(1): >>cpncru+3m
◧◩
4. cpncru+3m[view] [source] [discussion] 2025-12-05 18:52:00
>>james2+L2
>You call it extortion of the AI companies, but isn’t stealing/crawling/hammering a site to scrape their content to resell just as nefarious?

You can easily block ChatGPT and most other AI scrapers if you want:

https://habeasdata.neocities.org/ai-bots

replies(6): >>james2+9s >>jacobg+fB >>mplewi+v51 >>chrneu+R61 >>litera+8d1 >>Sohcah+7e1
◧◩◪
5. james2+9s[view] [source] [discussion] 2025-12-05 19:16:18
>>cpncru+3m
This is just using robots.txt and asking "pretty please, don’t scrape me".

Here is an article (from TODAY) about the case where Perplexity is being accused of ignoring robots.txt: https://www.theverge.com/news/839006/new-york-times-perplexi...

If you think a robots.txt is the answer to stopping the billion-dollar AI machine from scraping you, I don’t know what to say.

replies(2): >>cpncru+Yk1 >>Aeolun+Ws1
◧◩◪
6. jacobg+fB[view] [source] [discussion] 2025-12-05 20:01:14
>>cpncru+3m
I'm guessing you don't manage any production web servers?

robots.txt isn't even respected by all of the American companies. Chinese ones (which often also use what are essentially botnets in Latin American and the rest of the world to evade detection) certainly don't care about anything short of dropping their packets.

replies(1): >>cpncru+mj1
◧◩◪
7. mplewi+v51[view] [source] [discussion] 2025-12-05 22:43:04
>>cpncru+3m
No you cannot! I blocked all of the user agents on a community wiki I run, and the traffic came back hours later masquerading as Firefox and Chrome. They just fucking lie to you and continue vacuuming your CPU.
replies(1): >>cpncru+Jk1
◧◩◪
8. chrneu+R61[view] [source] [discussion] 2025-12-05 22:52:47
>>cpncru+3m
this is the equivalent of asking people not to speed on your street.
◧◩◪
9. litera+8d1[view] [source] [discussion] 2025-12-05 23:35:36
>>cpncru+3m
Tell me you don't run a site without telling me you don't run a site
replies(1): >>cpncru+ok1
◧◩◪
10. Sohcah+7e1[view] [source] [discussion] 2025-12-05 23:44:24
>>cpncru+3m
How are you this naive? Do you really think scrapers give a damn about your robots.txt?
replies(1): >>cpncru+Sk1
11. gblarg+yi1[view] [source] 2025-12-06 00:21:33
>>cpncru+(OP)
More and more sites I can't even visit because of this "prove you're human" because it's not compatible with older web browsers, even though the website it's blocking is.
◧◩◪◨
12. cpncru+mj1[view] [source] [discussion] 2025-12-06 00:27:22
>>jacobg+fB
I have been managing production commercial web servers for 28 years.

Yes, there are various bots, and some of the large US companies such as Perplexity do indeed seem to be ignoring robots.txt.

Is that a problem? It's certainly not a problem with cpu or network bandwidth (it's very minimal). Yes, it may be an issue if you are concerned with scraping (which I'm not).

Cloudflare's "solution" is a much bigger problem that affects me multiple times daily (as a user of sites that use it), and those sites don't seem to need protection against scraping.

replies(2): >>filled+Rl1 >>kviran+dn1
◧◩◪◨
13. cpncru+ok1[view] [source] [discussion] 2025-12-06 00:35:24
>>litera+8d1
Tell me you make incorrect assumptions without specifically saying so. (Yes, you're incorrect).
◧◩◪◨
14. cpncru+Jk1[view] [source] [discussion] 2025-12-06 00:38:34
>>mplewi+v51
There shouldn't be any noticeable hit on your cpu from bots from a site like that. Are you sure it's not a DDoS?

Obviously it depends on the bot, and you can't block the scammy ones. I was really just referring to the major legitimate companies (which might not include Perplexity).

replies(1): >>litera+1m1
◧◩◪◨
15. cpncru+Sk1[view] [source] [discussion] 2025-12-06 00:39:14
>>Sohcah+7e1
The legitimate ones do, which is what I was referring to. Obviously there are bastard ones as well.
◧◩◪◨
16. cpncru+Yk1[view] [source] [discussion] 2025-12-06 00:39:51
>>james2+9s
Yes, I was referring to legitimate companies, and Perplexity doesn't seem to be one of those.
replies(1): >>albedo+GJ1
◧◩◪◨⬒
17. filled+Rl1[view] [source] [discussion] 2025-12-06 00:48:50
>>cpncru+mj1
It is rather disingenuous to backpedal from "you can easily block them" to "is that a problem? who even cares" when someone points out that you cannot in fact easily block them.
replies(1): >>cpncru+Dn1
◧◩◪◨⬒
18. litera+1m1[view] [source] [discussion] 2025-12-06 00:50:28
>>cpncru+Jk1
There is a noticeable hit, there's also a noticeable cost, and it's not a ddos.

Not all sites can have full caching, we've tried.

replies(1): >>cpncru+ao1
◧◩◪◨⬒
19. kviran+dn1[view] [source] [discussion] 2025-12-06 01:03:38
>>cpncru+mj1
Security almost always brings inconvenience (to everyone involved, including end users). That is part of its cost.
replies(1): >>cpncru+np1
◧◩◪◨⬒⬓
20. cpncru+Dn1[view] [source] [discussion] 2025-12-06 01:07:59
>>filled+Rl1
I was referring to legitimate ones, which you can easily block. Obviously there are scammy ones as well, and yes it is an issue, but for most sites I would say the cloudflare cure is worse than the problem it's trying to cure.
replies(1): >>oasisb+AH2
◧◩◪◨⬒⬓
21. cpncru+ao1[view] [source] [discussion] 2025-12-06 01:11:49
>>litera+1m1
I was referring to the community wiki.
◧◩◪◨⬒⬓
22. cpncru+np1[view] [source] [discussion] 2025-12-06 01:22:25
>>kviran+dn1
What security issue is actually being solved here though?
◧◩◪◨
23. Aeolun+Ws1[view] [source] [discussion] 2025-12-06 01:58:47
>>james2+9s
If someone has a robots.txt, and I want to request their page, but I want to do that in an automated way, should I open the browser to do it instead of issue a curl request? How about if I am going to ask claude to fetch the page for me?
replies(1): >>kentm+qB1
◧◩◪◨⬒
24. kentm+qB1[view] [source] [discussion] 2025-12-06 03:26:17
>>Aeolun+Ws1
Respect the robots.txt and don’t do it?
◧◩◪◨⬒
25. albedo+GJ1[view] [source] [discussion] 2025-12-06 05:02:45
>>cpncru+Yk1
Oh for sure. When he wrote of the AI companies that are "stealing/crawling/hammering", you thought he meant the legitimate ones that do honor robots.txt. That makes sense.
replies(1): >>cpncru+nM1
◧◩◪◨⬒⬓
26. cpncru+nM1[view] [source] [discussion] 2025-12-06 05:44:44
>>albedo+GJ1
Actually, it looks like all the major ones do honour robots.txt including perplexity. They seemingly get around it using google serps, so theyre not actually crawling or hammering the site servers (or even cloudflare).

https://www.ailawandpolicy.com/2025/10/anti-circumvention-re...

27. _kidli+3T1[view] [source] 2025-12-06 07:42:25
>>cpncru+(OP)
the two things are unrelated...

The pay-per-crawl thing, is about them thinking ahead about post-AI business/revenue models.

The way AI happened, it removed a big chunk of revenue from news companies, blogs, etc. Because lots of people go to AI instead of reaching the actual 3rd party website.

AI currently gets the content for free from the 3rd party websites, but they have revenue from their users.

So Cloudflare is proposing that AI companies should be paying for their crawling. Cloudflare's solution would give the lost revenue back where it belongs, just through a different mechanism.

The ugly side of the story is that this was already an existing solution, and open source, called L402.org.

Cloudflare wants to be the first to take a piece of the pie, but also instead of using the open source version, they forked it internally and published it as their own service, which is cloudflare specific.

To be completely fair, the l402 requires you to solve the payment mechanism itself, which for Cloudflare is easy because they already deal with payments.

28. pmdr+4W1[view] [source] 2025-12-06 08:29:52
>>cpncru+(OP)
In my experience it's been in recent years, not months.
◧◩◪◨⬒⬓⬔
29. oasisb+AH2[view] [source] [discussion] 2025-12-06 16:34:31
>>cpncru+Dn1
No true scotsman needs Cloudflare, as any true scotsman can block AI bots themselves is not a strong argument.
replies(1): >>cpncru+LB3
30. stef25+kJ2[view] [source] 2025-12-06 16:47:36
>>cpncru+(OP)
> I've noticed that in recent months, even apart from these outages, cloudflare has been contributing to a general degradation and shittification of the internet. I'm seeing a lot more "prove you're human", "checking to make sure you're human", and there is normally at the very least a delay of a few seconds before the site loads.

Good to know I'm not the only one

31. bobbob+sg3[view] [source] 2025-12-06 21:28:57
>>cpncru+(OP)
Ive been seeing more of those prove your human pages as well, but I generally assume they are there to combat a DDOS or other type of attack (or maybe ai/bot). I remember how annoying it was combating DDOS attacks, or hacked sites before Cloudflare existed. I also remember how annoying capcha s were, everywhere. Cloudflare is not perfect but net, I think it’s been a great improvement.
32. chamom+Mg3[view] [source] 2025-12-06 21:33:05
>>cpncru+(OP)
Feel like that’s the fault of LLMs, not cloudflare
replies(1): >>cpncru+xD3
◧◩◪◨⬒⬓⬔⧯
33. cpncru+LB3[view] [source] [discussion] 2025-12-07 00:30:45
>>oasisb+AH2
But is there any actual evidence that any major AI bots are bypassing robots.txt? It looked as if Perplexity was doing this, but after looking into it further it seems that likely isn't the case. Quite often people believe single source news stories without doing any due diligence or fact checking.
◧◩
34. cpncru+xD3[view] [source] [discussion] 2025-12-07 00:47:51
>>chamom+Mg3
Looking into this more, it does indeed seem to be a cloudflare problem. It looks like cloudflare made a significant error in their bot fingerprinting, and Perplexity wasn't actually bypassing robots.txt.

https://www.perplexity.ai/hub/blog/agents-or-bots-making-sen...

To be honest I find cloudflare a much more scammy company than Perplexity. I had a DDoS attack a few years ago which originated from their network, and they had zero interest in it.

[go to top]