zlacker

[parent] [thread] 3 comments
1. blackl+(OP)[view] [source] 2023-07-08 07:36:27
Why are those folks trying to sprinkle AI over everything, even when it's completely inappropriate?

There's no AI involved in web crawling. If you come to my site, I'll tell you which pages you can visit/index, and which pages you can't, end of the story

Yes, there are security concerns with people putting /very-secret-admin-panel in their robots.txt and letting malicious actors know what URLs they should target. But if /very-secret-admin-panel is never linked by any public page, then the bot won't encounter it, therefore this stuff should never belong to robots.txt.

Please keep it as straightforward as this and don't add any AI bullshit to one of the few remaining simple processes in web development and administration.

replies(2): >>iamphi+F >>simion+p4
2. iamphi+F[view] [source] 2023-07-08 07:45:10
>>blackl+(OP)
Perhaps they’re intending on a means to say whether your content can be used within an AI training model or not.
replies(1): >>denton+wj1
3. simion+p4[view] [source] 2023-07-08 08:27:13
>>blackl+(OP)
Maybe some websites would like to specify something more then "I allow everything", maybe you could specify a license for the data on the page, like if you are OK for using it in open source research, open source AI training but not allow the data to be used in proprietary AI, or you do not want any kind of AI/research on the data , only search indexing.
◧◩
4. denton+wj1[view] [source] [discussion] 2023-07-08 18:26:05
>>iamphi+F
Why would any webmaster allow any of their content be used to train an AI? What's in it for them?

The deal with searchbots is that you allow indexing because you want to be found. But no such quid-pro-quo occurs when the content is just fed into the maw of an AI trainer.

[go to top]