zlacker

[return to "Tell HN: We should start to add “ai.txt” as we do for “robots.txt”"]
1. jeroen+mf[view] [source] 2023-05-10 13:42:39
>>Jeanne+(OP)
If AI needs explicit information and context, surely it should focus on improving its context recognition rather than trying to fix that by inserting even more training data.

Regardless, I do agree that something like a robots.txt for AI can be very useful. I'd like my website to be excluded from most AI projects and some kind of standardized way to communicate this preference would be nice, although I realize most AI projects don't exactly care about things like the wishes of authors, copyright, or ethical considerations. It's the idea that matters, really.

If I can use an ai.txt to convince the crawlers that my website contains illegal hardcore terrorist pornography to get it excluded from the datasets, that's another way to accomplish this I suppose.

◧◩
2. LawTal+rW[view] [source] 2023-05-10 16:45:51
>>jeroen+mf
> focus on improving its context recognition rather than trying to fix that by inserting even more training data.

That's how you improve its context recognition. You show it many contexts.

> most AI projects don't exactly care about things like the wishes of authors, copyright, or ethical considerations

Why is it 'ethical' that you get to add a bunch of restrictions to a pre-negotiated situation? You get copyright protections in trade for letting people use your work. There's a way to add restrictions - licensing - and you're looking to get the benefits of licensing, and to take away fair use right from other people, without paying the costs of doing so.

fwiw, I copy most pages I visit and store them. The website has given me the equivalent of a pamphlet and I store it instead of discarding it when I'm finished. This way I can go back and read it again later without having to track down the author and ask for another copy. It's not AI which has me doing this, I've been doing it for decades - it's censorship that has shown me the need.

◧◩◪
3. jeroen+601[view] [source] 2023-05-10 17:01:46
>>LawTal+rW
> There's a way to add restrictions - licensing - and you're looking to get the benefits of licensing, and to take away fair use right from other people, without paying the costs of doing so.

The way copyright laws work is that work is copyrighted (assuming the work is original enough, of course) by default. You don't get to use it unless you have a license. Now, of course, as an author, you can choose to add a license to your work (whether that's CC0 or GPL-3), but you don't have to.

You do have an implicit license to consume this content, but not to reproduce it. If you put all of those copies you've saved on some public other website, that's a copyright violation. Furthermore, access to privately-owned blog posts and websites is a privilege, not a right. You're not my boss, I don't have to write content for you.

The exact legal status of AI models trained on other people's unlicensed works and their output is still largely unknown. Legal professionals much more qualified than me have argued how AI models and generated work can either be completely fair use, with no need to apply any kind of copyright restriction, or how AI generated work can be classified as a derivative work, which means you need a license. There are two major lawsuits about this going on as far as I know and it'll take years for those to flesh out.

If it turns out that AI models and the works they produce are completely fair game, I suppose I'll need take down my content wherever I can in order not to be a free source of training data for big tech; public datasets and the internet archive will still have to respond to DMCA takedowns, after all. However, I'm not all that confident that what AI is doing is all that legally okay.

I have no problem with you saving and archiving anything you want to read. I also fully support the Internet Archive and its goal. I do have a problem with these multi billion dollar companies scouring the internet for their money maker, giving nothing in return.

[go to top]