Anthropic cut up millions of used books, and downloaded 7M pirated ones

>>pyman+(OP)
The important parts:

> Alsup ruled that Anthropic's use of copyrighted books to train its AI models was "exceedingly transformative" and qualified as fair use

> "All Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies"

It was always somewhat obvious that pirating a library would be copyright infringement. The interesting findings here are that scanning and digitizing a library for internal use is OK, and using it to train models is fair use.

>>dehrma+DS
Im not sure how I feel about what anthropic did on merit as a matter of scale, but from a legalistic standpoint how is it different from using the book to train the meat model in my head? I could even learn bits by heart and quote them in context.

>>sershe+zv1
Not sure about the law, but if you memorize and quote bits of a book and fail to attribute them, you could be accused of plagiarism. If for example you were a journalist or researcher, this could have professional consequences. Anthropic is building tools to do the same at immense scale with no concept of what plagiarism or attribution even is, let alone any method to track sourcing--and they're still willing to sell these tools. So even if your meat model and the trained model do something similar, you have a notably different understanding of what you're doing. Responsibility might ultimately fall to the end user, but it seems like something is getting laundered here.

zlacker