edit: Would be very funny if OpenAI used an educational fair use defense
I hope you don’t think that’s all whats happening, right?
>LLM training is a special type of reading that should be considered infringement
OK, what turn of phrase would you prefer?
> As outlined in the lawsuit, the Times alleges OpenAI and Microsoft’s large language models (LLMs), which power ChatGPT and Copilot, “can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style.” This “undermine[s] and damage[s]” the Times’ relationship with readers, the outlet alleges, while also depriving it of “subscription, licensing, advertising, and affiliate revenue.”
Nobody can argue that OpenAI was feeding the content to ChatGPT because ChatGPT was bored or was curious about current events. It was fed NYT's content so it would know how to reproduce similar content, for profit.
I think getting a case-law in the books as to what is legal, and what is not, with LLMs, was inevitable. If it wasn't NYT suing ChatGPT, it would be another publisher, or another artist, whose work was used to "train" these systems.
Absolutely not copyright infringement
> mimics its expressive style
Absolutely not copyright infringement
> can generate output that recites Times content verbatim
This one seems the closest to infringement, but still doesn't seem like infringement. A printer has this capability too. If a user told ChatGPT to recite NYT content and then sold that content, that would be 100% infringement, but would probably be on the user, not the tool. e.g. if someone printed out NYT articles and sold them, nobody would come after the printer manufacturer.
> undermine[s] and damage[s]” the Times’ relationship with readers, the outlet alleges, while also depriving it of “subscription, licensing, advertising, and affiliate revenue.
This claim seems far fetched as the point of the NYT is to report the news. One thing that LLMs absolutely cannot do is report today's news. I can see no way that ChatGPT is a substitute for the NYT in a way that violates copyright.
Because ultimately, our entire knowledge is based on the knowledge of others and is remixed, 'charged' and changed by us after reading. I also think that the New York Times uses the contents of others to create new content.
Sounds like journalism school?
You have to imagine these limits are already fairly known within the legal community... If you're accused of copying/republishing my published work there will be some minimal threshold of similarity I would need to prove in order to seek damages.
The fact that the model can reproduce large chunks of the original text verbatim is proof positive that it contains copies of the original text encoded in its weights. If I wrote a program that crawled the NYT site, zipping the contents, and retrieved articles based on keyword searches and made them available online, would you not say I'm infringing their copyright?
> e.g. if someone printed out NYT articles and sold them, nobody would come after the printer manufacturer.
If the printer manufacturer had a product that could take one sentence and it would print multiple pages that complete a news article from that sentence, ...
2. The non-profit OpenAI, Inc. company is not to be confused with the for-profit OpenAI GP, LLC [0] that it controls. OpenAI was solely a non-profit from 2015-2019, and, in 2019, the for-profit arm was created, prior to the launch of ChatGPT. Microsoft has a significant investment in the for-profit company, which is why they're included in this lawsuit.