zlacker

[return to "Tell HN: We should start to add “ai.txt” as we do for “robots.txt”"]
1. samwil+H5[view] [source] 2023-05-10 12:56:05
>>Jeanne+(OP)
Using robots.txt as a model for anything doesn't work. All a robots.txt is is a polite request to please follow the rules in it, there is no "legal" agreement to follow those rules, only a moral imperative.

Robots.txt has failed as a system, if it hadn't we wouldn't have captchas or Cloudflare.

In the age of AI we need to better understand where copyright applies to it, and potentially need reform of copyright to align legislation with what the public wants. We need test cases.

The thing I somewhat struggle with is that after 20-30 years of calls for shorter copyright terms, lesser restrictions on content you access publicly, and what you can do with it, we are now in the situation where the arguments are quickly leaning the other way. "We" now want stricter copyright law when it comes to AI, but at the same time shorter copyright duration...

In many ways an ai.txt would be worse than doing nothing as it's a meaningless veneer that would be ignored, but pointed to as the answer.

◧◩
2. shaneb+P6[view] [source] 2023-05-10 13:01:25
>>samwil+H5
"Robots.txt has failed as a system, if it hadn't we wouldn't have captchas or Cloudflare."

I like the idea of "ai.txt" but those who eat resources rarely listen to ToS. Frankly, I serve 503s to all identifiable bots, unless they are on my explicit allow list.

◧◩◪
3. always+e8[view] [source] 2023-05-10 13:08:25
>>shaneb+P6
Why not serve fake garbage indistinguishable from real content by a computer, like LLM output? Sending errors just incentivizes bot owners to fix the identifiable parts
◧◩◪◨
4. shaneb+gc[view] [source] 2023-05-10 13:28:25
>>always+e8
"Why not serve fake garbage indistinguishable from real content by a computer, like LLM output?"

Serving more than the minimum wastes resources. Worse yet, a better solution would cost my time.

"Sending errors just incentivizes bot owners to fix the identifiable parts"

Sure, someone could make or configure their scraper perfectly. "Perfect" is now the table stakes though.

Edit:

My solution strives to cause an unproportional expense in order to circumvent. I want 10x on my time.

[go to top]