Does this imply that distributing open-weights models such as Llama is copyright infringement, since users can trivially run the model without output filtering to extract the memorized text?
[1]: https://storage.courtlistener.com/recap/gov.uscourts.cand.43...
It's sort of like distributing a compendium of book reviews. Many of the reviews have quotes from the book. If there are thousands of reviews, you could potentially reconstruct the whole book, but that's not the point of the thing and so it makes sense for the infringing thing to be "using it to reconstruct the whole book" rather than "distributing the compendium".
And then Anthropic fended off the argument that their service was intended for doing the former because they were explicitly taking measures to prevent that.
Maybe this is a misrepresentation of the actual Anthropic case, I have no idea, but it’s the scenario I was addressing.