The New York Times is suing OpenAI and Microsoft for copyright infringement

>>ssgodd+(OP)
Solidly rooting for NYT on this - it’s felt like many creative organizations have been asleep at the wheel while their lunch gets eaten for a second time (the first being at the birth of modern search engines.)

I don’t necessarily fault OpenAI’s decision to initially train their models without entering into licensing agreements - they probably wouldn’t exist and the generative AI revolution may never have happened if they put the horse before the cart. I do think they should quickly course correct at this point and accept the fact that they clearly owe something to the creators of content they are consuming. If they don’t, they are setting themselves up for a bigger loss down the road and leaving the door open for a more established competitor (Google) to do it the right way.

>>kbos87+Na
It’s likely fair use.

>>theGnu+Ri
Playing back large passages of verbatim content sold as your “product” without citation is almost certainly not fair use. Fair use would be saying “The New York Times said X” and then quoting a sentence with attribution. Thats not what OpenAI is being sued for. They’re being sued for passing off substantial bits of NYTimes content as their own IP and then charging for it saying it’s their own IP.

This is also related to earlier studies about OpenAI where their models have a bad habit of just regurgitating training data verbatim. If your trained data is protected IP you didn’t secure the rights for then that’s a real big problem. Hence this lawsuit. If successful, the floodgates will open.

>>JCM9+Bj
At the root, it seems like there's also a gap in copyright with respect to AI around transformative.

Is using something, in its entirety, as a tiny bit of a massive data set, in order to produce something novel... infringing?

That's a pretty weird question that never existed when copyright was defined.

>>ethbr1+3o
Replace the AI model by a human, and it should become pretty clear what is allowed and what isn’t, in terms of published output. The issue is that an AI model is like a human that you can force to produce copyright-infringing output, or at least where you have little control over whether the output is copyright-infringing or not.

zlacker