zlacker

[return to "A federal judge sides with Anthropic in lawsuit over training AI on books"]
1. Fluore+bN[view] [source] 2025-06-24 20:51:07
>>moose4+(OP)
I'm surprised we never discuss a previous case of how governments handled a valuable new technology that challenged creative's ability to monetise their work:

Cassette Tapes and Private Copying Levy.

https://en.wikipedia.org/wiki/Private_copying_levy

Governments didn't ban tapes but taxed them and fed the proceeds back into the royalty system. An equivalent for books might be an LLM tax funding a negative tax rate for sold books e.g. earn $5 and the gov tops it up. Can't imagine how to ensure it was fair though.

Alternatively, might be an interesting math problem to calculate royalties for the training data used in each user request!

◧◩
2. bonobo+VP[view] [source] 2025-06-24 21:07:04
>>Fluore+bN
Surely this would require the observation that the public is actually using LLMs as a substitute for purchasing the book, ie they sit down and type "Generate me the first/second/third chapter of The Da Vinci Code" and then read if from there. Because it was easy to observe in the cassette tape era that people copied the store bought music and films and shared it among each other. I doubt that this is or will be a serious use case of LLMs.
◧◩◪
3. Fluore+bW[view] [source] 2025-06-24 21:52:22
>>bonobo+VP
It's different but not in ways that make such interventions irrelevant e.g. why would we only care about lost sales? If copyright has been violated as a necessary means to generate new value, haven't the content creators earned this value?

Such imperfect measures offer a compromise between "big tech can steal everything" and "LLMs trained on unpurchased books are illegal".

It's not just books but any tragedy-of-the-commons situation where a "feeder industry" for training can be fatally undermined by the very LLM that desires future training data from that industry.

◧◩◪◨
4. bonobo+l01[view] [source] 2025-06-24 22:29:25
>>Fluore+bW
> It's different but not in ways that make such interventions irrelevant e.g. why would we only care about lost sales? If copyright has been violated as a necessary means to generate new value, haven't the content creators earned this value?

Indeed the company should purchase the books. If they obtain copies in a process that violates copyright, then that's indeed a violation of copyright.

The current decision does not rule on the legality of obtaining the books without purchasing.

◧◩◪◨⬒
5. ethbr1+Fa1[view] [source] 2025-06-25 00:02:40
>>bonobo+l01
Anthropic apparently did it both ways. After realizing that pirating mass quantities of books for training wasn't a great legal look, it hired someone previously responsible for Google Books, who in turn contacted publishers about mass licensing their content for training use.

However, that option was ultimately not pursued as instead...

>> Anthropic spent many millions of dollars to purchase millions of print books, often in used condition. Then, its service providers stripped the books from their bindings, cut their pages to size, and scanned the books into digital form — discarding the paper originals. Each print book resulted in a PDF copy containing images of the scanned pages with machine-readable text (including front and back cover scans for softcover books). Anthropic created its own catalog of bibliographic metadata for the books it was acquiring. It acquired copies of millions of books, including of all works at issue for all Authors.

(from the ruling)

[go to top]