zlacker

[return to "Google to explore alternatives to robots.txt"]
1. blackl+X9[view] [source] 2023-07-08 07:36:27
>>skille+(OP)
Why are those folks trying to sprinkle AI over everything, even when it's completely inappropriate?

There's no AI involved in web crawling. If you come to my site, I'll tell you which pages you can visit/index, and which pages you can't, end of the story

Yes, there are security concerns with people putting /very-secret-admin-panel in their robots.txt and letting malicious actors know what URLs they should target. But if /very-secret-admin-panel is never linked by any public page, then the bot won't encounter it, therefore this stuff should never belong to robots.txt.

Please keep it as straightforward as this and don't add any AI bullshit to one of the few remaining simple processes in web development and administration.

◧◩
2. simion+me[view] [source] 2023-07-08 08:27:13
>>blackl+X9
Maybe some websites would like to specify something more then "I allow everything", maybe you could specify a license for the data on the page, like if you are OK for using it in open source research, open source AI training but not allow the data to be used in proprietary AI, or you do not want any kind of AI/research on the data , only search indexing.
[go to top]