zlacker

[parent] [thread] 2 comments
1. judge2+(OP)[view] [source] 2023-07-08 16:12:20
You can block the crawler on your entire site. I’m not sure it’s true that it’s primarily used “to avoid overloading your site”.
replies(1): >>blacks+gb
2. blacks+gb[view] [source] 2023-07-08 17:18:40
>>judge2+(OP)
For sure, since those directives in your robots.txt don't actually compel the crawlers to do anything. They're more like a polite request, and plenty of bots ignore or 'accidentally' overstep them. I do think they have still some value, not just as a handy list of high-value targets - you may know that some part of your site has a bunch of similar links that it doesn't make sense to crawl or index (though there's always norel/nofollow...), or that some pages (/account/preferences etc.) just don't make sense for bots to be visiting. The general idea of extending the standard to cover training AI isn't a terrible idea, but it does seem like too little, too late.
replies(1): >>lakome+mg1
◧◩
3. lakome+mg1[view] [source] [discussion] 2023-07-09 01:35:22
>>blacks+gb
robots.txt tells the search engine which content is relevant. That's all.
[go to top]