zlacker

Playing back large passages of verbatim content sold as your “product” without citation is almost certainly not fair use. Fair use would be saying “The New York Times said X” and then quoting a sentence with attribution. Thats not what OpenAI is being sued for. They’re being sued for passing off substantial bits of NYTimes content as their own IP and then charging for it saying it’s their own IP.

This is also related to earlier studies about OpenAI where their models have a bad habit of just regurgitating training data verbatim. If your trained data is protected IP you didn’t secure the rights for then that’s a real big problem. Hence this lawsuit. If successful, the floodgates will open.

replies(3): >>ethbr1+s4 >>aragon+N7 >>shkkmo+h8

>>JCM9+(OP)
At the root, it seems like there's also a gap in copyright with respect to AI around transformative.

Is using something, in its entirety, as a tiny bit of a massive data set, in order to produce something novel... infringing?

That's a pretty weird question that never existed when copyright was defined.

replies(2): >>layer8+77 >>bawolf+mb

>>ethbr1+s4
Replace the AI model by a human, and it should become pretty clear what is allowed and what isn’t, in terms of published output. The issue is that an AI model is like a human that you can force to produce copyright-infringing output, or at least where you have little control over whether the output is copyright-infringing or not.

replies(1): >>bawolf+cc

>>JCM9+(OP)
> They’re being sued for passing off substantial bits of NYTimes content as their own IP and then charging for it saying it’s their own IP.

In what sense are they claiming their generated contents as their own IP?

https://www.zdnet.com/article/who-owns-the-code-if-chatgpts-...

> OpenAI (the company behind ChatGPT) does not claim ownership of generated content. According to their terms of service, "OpenAI hereby assigns to you all its right, title and interest in and to Output."

https://openai.com/policies/terms-of-use

> Ownership of Content. As between you and OpenAI, and to the extent permitted by applicable law, you (a) retain your ownership rights in Input and (b) own the Output. We hereby assign to you all our right, title, and interest, if any, in and to Output.

replies(5): >>_aavaa+Nd >>jprete+Ud >>JCM9+Mm >>tsimio+Bp >>lelant+8u

>>JCM9+(OP)
I would note that in the examples the NYT cites, the prompts explicitly ask for the reproduction of content.

I think it makes sense to hold model makers responsible when their tools make infringement too easy to do or possible to do accidentally. However that is a far cry from requiring a little longer license to do the trainint in the first place.

>>ethbr1+s4
I think it did come up back in the day sort of, for example with libraries.

More importantly, ever case is unique so what really came up was a set of principles for what defines fair use, which will definitely guide this.

>>layer8+77
Its less clear than you think, and comes down more on how OpenAI is commercially benefiting and competiting with NYT than what they actually did. (See four factors of fair use)

>>aragon+N7
They can’t transfer rights to the output of it isn’t theirs to begin with.

Saying they don’t claim the rights over their output while outputting large chunks verbatim is the old YouTube scheme of upload movie and say “no copyright intended”.

replies(1): >>JCM9+Cn

>>aragon+N7
That part doesn't seem relevant to me in any case. IP pirates aren't prosecuted or sued because of a claim of ownership; they're prosecuted or sued over possession, distribution, or use.

>>aragon+N7
The bits you cite are legally bogus.

That would be like me just photocopying a book you wrote and then handing out copies saying we’re assigning different rights to the content. The whole point of the lawsuit is that OpenAI doesn’t own the content and thus they can’t just change the ownership rights per their terms of service. It doesn’t work like that.

replies(1): >>aragon+Eo

>>_aavaa+Nd
Exactly. And while one can easily just take down such a movie if an infringement claim is filed it’s unclear how one “removes” content from a trained model given how these models work. Thats messy.

replies(1): >>_aavaa+Dq

>>JCM9+Mm
Their legalese is careful to include the 'if any' qualifier ("We hereby assign to you all our right, title, and interest, if any, in and to Output.")

In any case, the point is that they made no claim to Output (as opposed to their code, etc) being their IP.

replies(1): >>tsimio+5q

>>aragon+N7
They are distributing the output, so they (implicitly) claim to have the right to distribute it. I can send you a movie I downloaded along with a license that says "I hereby assign to you all our right, title, and interest, if any, in and to Output. ", I'm still obviously infringing on the copyright of that movie (unless I have a deal that allows re-distribution, of course, as Netflix does).

>>aragon+Eo
That's irrelevant. The main point is that they are re-distributing the content without permission from the copyright owners, so they are sort of implicitly claiming they have copy/distribution rights over it. Since they don't, then it's obvious they can't give you this content at all.

replies(1): >>logicc+vA

>>JCM9+Cn
If it’s found that the use of the material is infringing on the rights of the copyright holder than the AI company has to retrain their model without any material they don’t have a right to. Pretty clear to me

replies(1): >>logicc+oA

>>aragon+N7
> In what sense are they claiming their generated contents as their own IP?

https://www.zdnet.com/article/who-owns-the-code-if-chatgpts-...

>> OpenAI (the company behind ChatGPT) does not claim ownership of generated content. According to their terms of service, "OpenAI hereby assigns to you all its right, title and interest in and to Output."

How are they giving you the rights to the work if they don't own it? They are literally asserting that they are in a position to assign the rights (to the output) to the user - that is a literal claim of ownership.

IOW, if someone says "Take this from me, I assure you it is legal to do so", they are asserting ownership of that thing.

>>_aavaa+Dq
By that logic Microsoft Word should have to refuse to save or print any text that contained copyrighted content. GPT is just a tool; the user who's asking it to produce copyrighted content (and then publishing that content) is the one violating the copyright, and they're the ones who should be liable.

replies(1): >>_aavaa+lE

>>tsimio+5q
>The main point is that they are re-distributing the content without permission from the copyright owners,

By your logic, Firefox is re-distributing content without permission from the copyright owners whenever you use it to read a pirated book. ChatGPT isn't just randomly generating copyrighted content, it just does so when explicitly prompted by a user.

replies(1): >>tsimio+qC

>>logicc+vA
That is not the same thing at all. If I search on Google for copyrighted content and Google shows me the content, it is the server which serves the content who is most directly responsible, not Google nor I. Firefox is only a neutral agent, whereas ChatGPT is the source of the copyrighted content.

Of course, if the input I give to ChatGPT is "here is a piece from an NYT aricle, please tell it to me again verbatim", followed by a copy I got from the NYT archive, and ChatGPT is returning the same text I gave it as input, that is not copyright infringement. But if I say "please show me the text of the NYT article on crime from 10th January 1993", and ChatGPT returns the exact text of that article, then they are obviously infringing on NYT's distribution rights for this content, since they are retrieving it from their own storage.

If they returned a link you could click, t and retrieved the content from the NYT, along with any other changes such as advertising, even if it were inside an iframe, it would be an entirely different matter.

>>logicc+oA
I don’t even know where to begin on this example.

The situations aren’t remotely similar and that much should be obvious. In one instance ChatGPT is reproducing copyrighted work and in the other Word is taking keyboard input from the user; Word itself isn’t producing anything itself.

> GPT is just a tool.

I don’t know what point this is supposed to make. It is not “just a tool” in the sense that it has no impact on what gets written.

Which brings us back to the beginning.

> the user who’s asking it to produce copyrighted content.

ChatGPT was trained on copyrighted content. The fact that it CAN reproduce the copyrighted content and the fact that it was trained on it is what the argument is about.