A federal judge sides with Anthropic in lawsuit over training AI on books

>>moose4+(OP)
Broadly summarizing.

This is OK and fair use: Training LLMs on copyrighted work, since it's transformative.

This is not OK and not fair use: pirating data, or creating a big repository of pirated data that isn't necessarily for AI training.

Overall seems like a pretty reasonable ruling?

>>3PS+V1
But those training the LLMs are still using the works, and not just to discuss them, which I think is the point of fair use doctrine. I guess I fail to see how it's any different from me using it in some other way? If I wanted to write a play very loosely inspired by Blood Meridian, it might be transformative, but that doesn't justify me pirating the book.

I tend to think copyright should be extremely limited compared to what it is now, but to me the logic of this ruling is illogical other than "it's ok for a corporation to use lots of works without permission but not for an individual to use a single work without permission." Maybe if they suddenly loosened copyright enforcement for everyone I might feel differently.

"Kill one man, and you are a murderer. Kill millions of men, and you are a conqueror." (An admittedly hyperbolic comparison, but similar idea.)

>>derbOa+H6
>If I wanted to write a play very loosely inspired by Blood Meridian, it might be transformative, but that doesn't justify me pirating the book.

I think that's the conclusion of the judge. If Anthropic were to buy the books and train on them, without extra permission from the authors, it would be fair use, much like if you were to be inspired by it (though in that case, it may not even count as a derivative work at all, if the relationship is sufficiently loose). But that doesn't mean they are free to pirate it either, so they are likely to be liable for that (exactly how that interpretation works with copyright law I'm not entirely sure: I know in some places that downloading stuff is less of a problem than distributing it to others because the latter is the main thing that copyright is concerned with. And AFAIK most companies doing large model training are maintaining that fair use also extends to them gathering the data in the first place).

(Fair use isn't just for discussion. It covers a broad range of potential use cases, and they're not enumerated precisely in copyright law AFAIK, there's a complicated range of case law that forms the guidelines for it)

>>rcxdud+Y7
which AFAIN IANAL, copyright and exhaustive rights are completely different. Under copyright, once a book is purchased: that's it. Reselling the same, or transformed (re: highlighted) worked 'used' is 100% legal, as is consuming it at your discretion (in your mind {a billion times}, a fire, or (yes even) what amounts to a fancy calculator).

(that's all to say copyright is dated and needs an overhaul)

But that's taking a viewpoint of 'training a personal AI in your home', which isn't something that actually happens... The issue has never been the training data itself. Training an AI and 'looking at data and optimizing a (human understanding/AI understanding) function over it' are categorically the same, even if mechanically/biologically they are very different.

zlacker