zlacker

[return to "Google to explore alternatives to robots.txt"]
1. Kwpols+ab[view] [source] 2023-07-08 07:51:20
>>skille+(OP)
Why would AI need a new standard for excluding it? Just add a "Googlebot-AI" user agent to your list [0] and respect these rules when crawling content for use in AIs, and convince OpenAI and Bing to do the same.

[0] https://developers.google.com/search/docs/crawling-indexing/...

◧◩
2. bastaw+CL[view] [source] 2023-07-08 14:07:42
>>Kwpols+ab
I have no insight, but I suspect it's a question of context: regular old search is about whether a page is indexed or not. Either a URL is part of the index or it isn't. But with AI, there's important questions about what's in those urls.

I think Google is probably thinking hard about the problem of training AI: you don't want to train on the output of other AI. That doesn't mean the content shouldn't be processed, just that it shouldn't be used for training. Or maybe it's worth noting that some content is derived from other content that you've manually produced, versus content derived from the content of third parties.

Said another way, I expect that Google isn't just implementing a new allowlist/denylist. It's likely about exposing new information about content.

[go to top]