zlacker

[parent] [thread] 4 comments
1. vore+(OP)[view] [source] 2023-07-08 06:09:15
To steelman this maybe, I think they’re angling for something like a mechanism to indicate content is OK to index but not OK to use as AI training data. Maybe you could fudge it today with user agents in robots.txt but who knows what the concrete idea of this is.
replies(2): >>varenc+i3 >>Aerroo+r9
2. varenc+i3[view] [source] 2023-07-08 06:50:33
>>vore+(OP)
robots.txt is already outmoded. It only can indicate that content can’t be crawled but a URL marked this way can still be indexed. As Google says “it is not a mechanism for keeping a web page out of Google” [0] You need to use other things besides robots.txt to preventing indexing.

[0] https://developers.google.com/search/docs/crawling-indexing/...

replies(1): >>dazc+26
◧◩
3. dazc+26[view] [source] [discussion] 2023-07-08 07:25:16
>>varenc+i3
Indeed, having pages indexed which can't then be crawled is a great way of shooting yourself in the foot.
replies(1): >>floomk+O51
4. Aerroo+r9[view] [source] 2023-07-08 08:04:43
>>vore+(OP)
This seems weird to me though, aren't search engines something very similar to AI, if not outright AI?
◧◩◪
5. floomk+O51[view] [source] [discussion] 2023-07-08 16:32:31
>>dazc+26
I think you meant it's a great way for google to punish you for not giving them full access
[go to top]