The New York Times is suing OpenAI and Microsoft for copyright infringement

>>ssgodd+(OP)
What are they arguing here? AFAIK reading copyrighted works is not copyright infringement. Copying and selling them is, as the name would suggest, but OpenAI absolutely did not do that. Are they trying to say that LLM training is a special type of reading that should be considered infringement? Seems like a weak case to me.

edit: Would be very funny if OpenAI used an educational fair use defense

>>fallin+E2
It should be noted that there are explicit exemptions to allow copying program data intro RAM and into CPU registers (in many licenses). Whether that is truly necessary or not is at best debatable, but arguably training a model (especially one you then distribute or give access to) on copyrighted data is vastly different from regular copying into memory and should require explicit licensing.

The fact that the model can reproduce large chunks of the original text verbatim is proof positive that it contains copies of the original text encoded in its weights. If I wrote a program that crawled the NYT site, zipping the contents, and retrieved articles based on keyword searches and made them available online, would you not say I'm infringing their copyright?

zlacker