zlacker

[return to "Google to explore alternatives to robots.txt"]
1. voytec+V3[view] [source] 2023-07-08 06:20:02
>>skille+(OP)
Seems like it's intended for content stealing from every place that doesn't immediately implement Google's New Web Order as an addition to robots.txt.

"Your do not enter sign uses font we don't like, so we'll just ignore it"

◧◩
2. Ferret+fh[view] [source] 2023-07-08 09:04:59
>>voytec+V3
To be clear, robots.txt is not legally binding, Google is not bound to follow it, and in fact I believe that Google doesn't follow it and hasn't for a very long time, for the simple reason that many sites' robots.txt file is wrong.

The intent of robots.txt is to help crawlers, for example, to keep crawlers from getting stuck in a recursive loop of dynamic pages, or from crawling pages with no value. robots.txt is not for banning, restricting, or hindering crawlers.

◧◩◪
3. superk+uS[view] [source] 2023-07-08 14:55:04
>>Ferret+fh
That's just because google is a corporate person who is more equal than a human person. Human persons, at least in the USA, get charged under the CFAA 1030 law if they're using non-browser tools to access the public website of someone with power and/if they happen to rock the boat (like weev w/wget).

That's not to say that I disagree. In most cases robots.txt is not legally binding. It only becomes a legal danger to not follow it when the person running the site has power and can buy a DA to indict you.

◧◩◪◨
4. rafark+Ln1[view] [source] 2023-07-08 17:57:26
>>superk+uS
If a tool can access a url, does that not make it a browser?
◧◩◪◨⬒
5. TeMPOr+WZ1[view] [source] 2023-07-08 22:10:36
>>rafark+Ln1
Not under any but most narrow of meanings, i.e. "can follow URLs / can talk HTTP". By itself, it's not a browser to users, it's not a browser to software developers, and it's definitely not a browser to lawyers and judges.
◧◩◪◨⬒⬓
6. rafark+c92[view] [source] 2023-07-08 23:40:43
>>TeMPOr+WZ1
Is there a legal definition of a web browser though? I think it’s an interesting topic.
[go to top]