Does this imply that distributing open-weights models such as Llama is copyright infringement, since users can trivially run the model without output filtering to extract the memorized text?
[1]: https://storage.courtlistener.com/recap/gov.uscourts.cand.43...
Additionally that if you download a model file that contains enough of the source material to be considered infringing (even without using the LLM, assume you can extract the contents directly out of the weights) then it might as well be a .zip with a PDF in it, the model file itself becomes an infringing object whereas closed models can be held accountable by not what they store but what they produce.
As far as I could tell, the book didn't match what's posted online today. The text was somewhat consistent on a topic, yet poorly written and made references to sections that I don't think existed. No amount of prompting could locate them. I'm not convinced the material presented to me was actually the book, although it seemed consistent with the topic of the chapter.
I tried to ascertain when the book had been scraped, yet couldn't find a match in Archive.org or in the book's git repo.
Eventually I gave up and just continued reading the PDF.