zlacker

Ok, fair point, I may be being a little hyperbolic. But my point is that it's not a system that we should copy for preventing the use of content in training AI. It would become a useless distraction.

If you "violate" a robots.txt the server administrator can choose to block your bot (if they can fingerprint it) or IP (if its static).

With an ai.txt there is no potential downside to violating it - unless we get new legislation enforcing its legal standing. The nature of ML models is that it's opaque what content exactly it's trained on, there is no obvious retaliation or retribution.

replies(4): >>Wowfun+j2 >>Burnin+Z2 >>capabl+83 >>jefftk+x4

>>samwil+(OP)
> But my point is that it's not a system that we should copy for preventing the use of content in training AI.

I don't think that's what OP is envisioning based on their post!

>>samwil+(OP)
OP is trying to give helpful info to the AI, not set boundaries for it.

>>samwil+(OP)
> But my point is that it's not a system that we should copy for preventing the use of content in training AI

The purpose OP is suggesting in the submission is the opposite, help AI crawlers to understand what the page/website is about without actually having to infer the purpose from the content itself.

replies(1): >>Xelyne+f7

>>samwil+(OP)
> It's not a system that we should copy for preventing the use of content in training AI

I don't see the OP saying anything about "ai.txt" being for that? They're advocating it as a way that AIs could use fewer tokens to understand what a site is about.

(Which I also don't think is a good idea, since we already have lots of ways of including structured metadata in pages, but the main problem is not that crawlers would ignore it.)

replies(1): >>kmoser+pB

>>capabl+83
Isn't that the entire point of the semantic web?

replies(1): >>kmoser+RB

>>jefftk+x4
Not only do we already have lots of ways of including structured metadata, but if you want to include directives about what should/shouldn't be scraped and by whom, we already have robots.txt.

In other words, there's no need to create an ai.txt when the robots.txt standard can just be extended.

>>Xelyne+f7
If only there was an HTML tag that let you provide a concise description of the page content. Perhaps something like <meta name="description" content="This is an example of a meta description. This will often show up in search results.">