The New York Times is suing OpenAI and Microsoft for copyright infringement

>>ssgodd+(OP)
Solidly rooting for NYT on this - it’s felt like many creative organizations have been asleep at the wheel while their lunch gets eaten for a second time (the first being at the birth of modern search engines.)

I don’t necessarily fault OpenAI’s decision to initially train their models without entering into licensing agreements - they probably wouldn’t exist and the generative AI revolution may never have happened if they put the horse before the cart. I do think they should quickly course correct at this point and accept the fact that they clearly owe something to the creators of content they are consuming. If they don’t, they are setting themselves up for a bigger loss down the road and leaving the door open for a more established competitor (Google) to do it the right way.

>>kbos87+Na
It’s likely fair use.

>>theGnu+Ri
Playing back large passages of verbatim content sold as your “product” without citation is almost certainly not fair use. Fair use would be saying “The New York Times said X” and then quoting a sentence with attribution. Thats not what OpenAI is being sued for. They’re being sued for passing off substantial bits of NYTimes content as their own IP and then charging for it saying it’s their own IP.

This is also related to earlier studies about OpenAI where their models have a bad habit of just regurgitating training data verbatim. If your trained data is protected IP you didn’t secure the rights for then that’s a real big problem. Hence this lawsuit. If successful, the floodgates will open.

>>JCM9+Bj
> They’re being sued for passing off substantial bits of NYTimes content as their own IP and then charging for it saying it’s their own IP.

In what sense are they claiming their generated contents as their own IP?

https://www.zdnet.com/article/who-owns-the-code-if-chatgpts-...

> OpenAI (the company behind ChatGPT) does not claim ownership of generated content. According to their terms of service, "OpenAI hereby assigns to you all its right, title and interest in and to Output."

https://openai.com/policies/terms-of-use

> Ownership of Content. As between you and OpenAI, and to the extent permitted by applicable law, you (a) retain your ownership rights in Input and (b) own the Output. We hereby assign to you all our right, title, and interest, if any, in and to Output.

>>aragon+or
They can’t transfer rights to the output of it isn’t theirs to begin with.

Saying they don’t claim the rights over their output while outputting large chunks verbatim is the old YouTube scheme of upload movie and say “no copyright intended”.

>>_aavaa+ox
Exactly. And while one can easily just take down such a movie if an infringement claim is filed it’s unclear how one “removes” content from a trained model given how these models work. Thats messy.

>>JCM9+dH
If it’s found that the use of the material is infringing on the rights of the copyright holder than the AI company has to retrain their model without any material they don’t have a right to. Pretty clear to me

>>_aavaa+eK
By that logic Microsoft Word should have to refuse to save or print any text that contained copyrighted content. GPT is just a tool; the user who's asking it to produce copyrighted content (and then publishing that content) is the one violating the copyright, and they're the ones who should be liable.

>>logicc+ZT
I don’t even know where to begin on this example.

The situations aren’t remotely similar and that much should be obvious. In one instance ChatGPT is reproducing copyrighted work and in the other Word is taking keyboard input from the user; Word itself isn’t producing anything itself.

> GPT is just a tool.

I don’t know what point this is supposed to make. It is not “just a tool” in the sense that it has no impact on what gets written.

Which brings us back to the beginning.

> the user who’s asking it to produce copyrighted content.

ChatGPT was trained on copyrighted content. The fact that it CAN reproduce the copyrighted content and the fact that it was trained on it is what the argument is about.

zlacker