zlacker

[return to "Tell HN: We should start to add “ai.txt” as we do for “robots.txt”"]
1. samwil+H5[view] [source] 2023-05-10 12:56:05
>>Jeanne+(OP)
Using robots.txt as a model for anything doesn't work. All a robots.txt is is a polite request to please follow the rules in it, there is no "legal" agreement to follow those rules, only a moral imperative.

Robots.txt has failed as a system, if it hadn't we wouldn't have captchas or Cloudflare.

In the age of AI we need to better understand where copyright applies to it, and potentially need reform of copyright to align legislation with what the public wants. We need test cases.

The thing I somewhat struggle with is that after 20-30 years of calls for shorter copyright terms, lesser restrictions on content you access publicly, and what you can do with it, we are now in the situation where the arguments are quickly leaning the other way. "We" now want stricter copyright law when it comes to AI, but at the same time shorter copyright duration...

In many ways an ai.txt would be worse than doing nothing as it's a meaningless veneer that would be ignored, but pointed to as the answer.

◧◩
2. brooks+G8[view] [source] 2023-05-10 13:10:27
>>samwil+H5
> Robots.txt has failed as a system, if it hadn't we wouldn't have captchas or Cloudflare.

Failing to solve every problem does not mean a solution is a failure.

From sunscreen to seatbelts, the world is full of great solutions that occasionally fail due to statistics and large numbers.

◧◩◪
3. samwil+U9[view] [source] 2023-05-10 13:16:31
>>brooks+G8
Ok, fair point, I may be being a little hyperbolic. But my point is that it's not a system that we should copy for preventing the use of content in training AI. It would become a useless distraction.

If you "violate" a robots.txt the server administrator can choose to block your bot (if they can fingerprint it) or IP (if its static).

With an ai.txt there is no potential downside to violating it - unless we get new legislation enforcing its legal standing. The nature of ML models is that it's opaque what content exactly it's trained on, there is no obvious retaliation or retribution.

◧◩◪◨
4. jefftk+re[view] [source] 2023-05-10 13:38:33
>>samwil+U9
> It's not a system that we should copy for preventing the use of content in training AI

I don't see the OP saying anything about "ai.txt" being for that? They're advocating it as a way that AIs could use fewer tokens to understand what a site is about.

(Which I also don't think is a good idea, since we already have lots of ways of including structured metadata in pages, but the main problem is not that crawlers would ignore it.)

◧◩◪◨⬒
5. kmoser+jL[view] [source] 2023-05-10 15:57:03
>>jefftk+re
Not only do we already have lots of ways of including structured metadata, but if you want to include directives about what should/shouldn't be scraped and by whom, we already have robots.txt.

In other words, there's no need to create an ai.txt when the robots.txt standard can just be extended.

[go to top]