A federal judge sides with Anthropic in lawsuit over training AI on books

>>moose4+(OP)
One aspect of this ruling [1] that I find concerning: on pages 7 and 11-12, it concedes that the LLM does substantially "memorize" copyrighted works, but rules that this doesn't violate the author's copyright because Anthropic has server-side filtering to avoid reproducing memorized text. (Alsup compares this to Google Books, which has server-side searchable full-text copies of copyrighted books, but only allows users to access snippets in a non-infringing manner.)

Does this imply that distributing open-weights models such as Llama is copyright infringement, since users can trivially run the model without output filtering to extract the memorized text?

[1]: https://storage.courtlistener.com/recap/gov.uscourts.cand.43...

>>Nobody+fc
You can use the copyrighted text for personal purposes.

>>deadba+Gc
But you can’t distribute it, which in the scenario mentioned in the parent’s final paragraph arguably happens.

>>layer8+dd
You can't distribute the copyrighted works, but that isn't inherently the same thing as the model.

It's sort of like distributing a compendium of book reviews. Many of the reviews have quotes from the book. If there are thousands of reviews, you could potentially reconstruct the whole book, but that's not the point of the thing and so it makes sense for the infringing thing to be "using it to reconstruct the whole book" rather than "distributing the compendium".

And then Anthropic fended off the argument that their service was intended for doing the former because they were explicitly taking measures to prevent that.

>>Anthon+Tv
The premise was that the model is able to reproduce the memorized text, and that what saved Anthropic was them having server-side filtering to avoid reproducing that text. So the presumption is that without those filters, the model would be able to reproduce text substantial enough to constitute a copyright violation (otherwise they wouldn’t need the filter argument). Distributing a “machine” producing such output would constitute copyright infringement.

Maybe this is a misrepresentation of the actual Anthropic case, I have no idea, but it’s the scenario I was addressing.

zlacker