zlacker

> focus on improving its context recognition rather than trying to fix that by inserting even more training data.

That's how you improve its context recognition. You show it many contexts.

> most AI projects don't exactly care about things like the wishes of authors, copyright, or ethical considerations

Why is it 'ethical' that you get to add a bunch of restrictions to a pre-negotiated situation? You get copyright protections in trade for letting people use your work. There's a way to add restrictions - licensing - and you're looking to get the benefits of licensing, and to take away fair use right from other people, without paying the costs of doing so.

fwiw, I copy most pages I visit and store them. The website has given me the equivalent of a pamphlet and I store it instead of discarding it when I'm finished. This way I can go back and read it again later without having to track down the author and ask for another copy. It's not AI which has me doing this, I've been doing it for decades - it's censorship that has shown me the need.

replies(1): >>jeroen+F3

>>LawTal+(OP)
> There's a way to add restrictions - licensing - and you're looking to get the benefits of licensing, and to take away fair use right from other people, without paying the costs of doing so.

The way copyright laws work is that work is copyrighted (assuming the work is original enough, of course) by default. You don't get to use it unless you have a license. Now, of course, as an author, you can choose to add a license to your work (whether that's CC0 or GPL-3), but you don't have to.

You do have an implicit license to consume this content, but not to reproduce it. If you put all of those copies you've saved on some public other website, that's a copyright violation. Furthermore, access to privately-owned blog posts and websites is a privilege, not a right. You're not my boss, I don't have to write content for you.

The exact legal status of AI models trained on other people's unlicensed works and their output is still largely unknown. Legal professionals much more qualified than me have argued how AI models and generated work can either be completely fair use, with no need to apply any kind of copyright restriction, or how AI generated work can be classified as a derivative work, which means you need a license. There are two major lawsuits about this going on as far as I know and it'll take years for those to flesh out.

If it turns out that AI models and the works they produce are completely fair game, I suppose I'll need take down my content wherever I can in order not to be a free source of training data for big tech; public datasets and the internet archive will still have to respond to DMCA takedowns, after all. However, I'm not all that confident that what AI is doing is all that legally okay.

I have no problem with you saving and archiving anything you want to read. I also fully support the Internet Archive and its goal. I do have a problem with these multi billion dollar companies scouring the internet for their money maker, giving nothing in return.

replies(1): >>LawTal+A8

>>jeroen+F3
> You don't get to use [a copyrighted work] unless you have a license.

Not when you give it to me. "Hey, can I see your pamphlet? Sure, here's a copy."

> an implicit license to consume this content

No, copyright prevents copying, not use. There's no implicit license needed to use a work so there's no place to attach those usage restrictions. If you want me to agree to a license you need to not give me the work until I do.

You could have a ToS click-through agreement ("no training an AI on this!"), and then only serve content to logged-in users who have agreed to your conditions.

> but not to reproduce it.

I agree - those "pamphlets" were given to me and I can't copy them for someone else. They'd have to view my collection.

> The exact legal status of AI models trained on other people's unlicensed works and their output is still largely unknown.

Sure, predicting all courts in the world is a futile exercise. Surely someone will try to over reach from copyright to preventing what they feel is a bad use but it's unlikely to become law because there are already analogous uses, scanning someone's text and pulling data from it - data like which words follow which other words.

> I do have a problem with these multi billion dollar companies scouring the internet for their money maker, giving nothing in return.

Well, FB released Llama... It's not a closed technology, it's being led by for-profit businesses but the community (which consists of many of the corporate engineers as well) is trying to keep up.

Even if you can and do attach usage regulations to your site I feel it'll hurt the little guy more than the corporations. There are probably not any unique linguistic constructions on your site that will render a corporate AI less valuable, but for hackers and tinkerers and eventual historians, who knows what it'll interfere with.

replies(1): >>jeroen+Vc

>>LawTal+A8
>> You don't get to use [a copyrighted work] unless you have a license.

>Not when you give it to me. "Hey, can I see your pamphlet? Sure, here's a copy."

>> an implicit license to consume this content

>No, copyright prevents copying, not use. There's no implicit license needed to use a work so there's no place to attach those usage restrictions. If you want me to agree to a license you need to not give me the work until I do.

>You could have a ToS click-through agreement ("no training an AI on this!"), and then only serve content to logged-in users who have agreed to your conditions.

Fair enough, I worded that wrong.

>Sure, predicting all courts in the world is a futile exercise. Surely someone will try to over reach from copyright to preventing what they feel is a bad use but it's unlikely to become law because there are already analogous uses, scanning someone's text and pulling data from it - data like which words follow which other words.

Kazaa was banned despite being very popular for a few years. The DMCA was signed into law years after the first copyright trouble started. Just because the government is slow doesn't mean they won't write new law.

> Well, FB released Llama... It's not a closed technology, it's being led by for-profit businesses but the community (which consists of many of the corporate engineers as well) is trying to keep up.

FB's model leaked, it was subject to a strict whitelist originally. They didn't mean for it to get out there, but they wisely chose not to cause the Streisand effect to hurt them even more. And OpenAI (nice name) stopped releasing their model after it became good enough.

> Even if you can and do attach usage regulations to your site I feel it'll hurt the little guy more than the corporations. There are probably not any unique linguistic constructions on your site that will render a corporate AI less valuable, but for hackers and tinkerers and eventual historians, who knows what it'll interfere with.

I don't want to hurt anyone. I wish AI companies would do the right thing and simply ask for permission before taking someone's work and training on it. I'd probably agree if they did so a few years back!

I know my contribution to the larger model is extremely insignificant. However, my incentive to help others is greatly diminished when my wishes and ethical concerns are ignored so blatantly. I also don't think I'm alone in this. The amount of digital art I'm seeing in my timelines has greatly decreased, for example; more and more is being locked away behind paywalls because sharing your work freely only helps megacorporations replace you.