A federal judge sides with Anthropic in lawsuit over training AI on books

>>sidewn+i8
https://en.wikipedia.org/wiki/Andy_Warhol_Foundation_for_the...

I wouldn't call it that. Goldsmith took a photograph of Prince which Warhol used as a reference to generate an illustration. Vanity Fair then chose to buy a license Warhol's print instead of Goldsmith's photograph.

So, despite the artwork being visual transformative (silkscreen vs photograph) the actual use was not transformed.

>>moose4+(OP)
One aspect of this ruling [1] that I find concerning: on pages 7 and 11-12, it concedes that the LLM does substantially "memorize" copyrighted works, but rules that this doesn't violate the author's copyright because Anthropic has server-side filtering to avoid reproducing memorized text. (Alsup compares this to Google Books, which has server-side searchable full-text copies of copyrighted books, but only allows users to access snippets in a non-infringing manner.)

Does this imply that distributing open-weights models such as Llama is copyright infringement, since users can trivially run the model without output filtering to extract the memorized text?

[1]: https://storage.courtlistener.com/recap/gov.uscourts.cand.43...

>>ticula+J7
What you are describing happened and they got sued:

https://en.wikipedia.org/wiki/Mickey_Mouse#Walt_Disney_Produ...

I'm on the Air Pirates side for the case linked, by the way.

However, AI is not a parody. It's not adding to the cultural expression like a parody would.

Let's forget all the law stuff and these silly hypotheticals. Let's think of humanity instead:

Is AI contributing to education and/or culture _right now_, or is it trying to make money? I think they're trying to make money.

>>johnny+97
I wonder if https://en.wikipedia.org/wiki/Illegal_number comes into play here.

>>doctor+n8
Do you mean Authors Guild, Inc. v. Google, Inc.? Google won that case:

https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....

Maybe there's another big Google Books lawsuit that Google ultimately lost, but I don't know which one you mean in that case.

>>almata+Kb
What contracts? And would it run afoul of first sale doctrine?

https://en.wikipedia.org/wiki/First-sale_doctrine

> The doctrine was first recognized by the Supreme Court of the United States in 1908 (see Bobbs-Merrill Co. v. Straus) and subsequently codified in the Copyright Act of 1909. In the Bobbs-Merrill case, the publisher, Bobbs-Merrill, had inserted a notice in its books that any retail sale at a price under $1.00 would constitute an infringement of its copyright. The defendants, who owned Macy's department store, disregarded the notice and sold the books at a lower price without Bobbs-Merrill's consent. The Supreme Court held that the exclusive statutory right to "vend" applied only to the first sale of the copyrighted work.

> Today, this rule of law is codified in 17 U.S.C. § 109(a), which provides:

> Notwithstanding the provisions of section 106 (3), the owner of a particular copy or phonorecord lawfully made under this title, or any person authorized by such owner, is entitled, without the authority of the copyright owner, to sell or otherwise dispose of the possession of that copy or phonorecord.

---

If I buy a copy of a book, you can't limit what I can do with the book beyond what copyright restricts me.

>>Nobody+fc
A judge already ruled that models themselves don't constitute copyright infringement in Kadrey v. Meta Platforms, Inc. (https://casetext.com/case/kadrey-v-meta-platforms-inc). The EFF has a good summary about it:

> the court dismissed “nonsensical” claims that Meta’s LLaMA models are themselves infringing derivative works.

See: https://www.eff.org/deeplinks/2025/02/copyright-and-ai-cases...

>>layer8+tg
Correct.

You have to call it "Starcrash" (https://www.imdb.com/title/tt0079946/?ref_=ls_t_8). Then it's legal.

>>Nobody+fc
Yes and no.

In this case, the plaintiffs alleged that Anthropic's LLMs had memorized the works so completely that "if each completed LLM had been asked to recite works it had trained upon, it could have done so", "almost verbatim". The judge assumed for the sake of argument that the allegation was true, and ruled that the conduct was fair use anyway due to the existence of an effective filter. Therefore there was no need to determine whether the allegation was actually true.

So - yes, in the sense that the ruling suggests that distributing an open-weight LLM that memorized copyrighted works to that extent would not be fair use.

But no, in the sense that it's not clear whether any LLMs, especially open-weight LLMs, actually memorize book-length works to that extent. Even the recent study about Llama memorizing a Harry Potter book [1] only said that Llama could reproduce 50-token snippets a decent percentage of the time when given the preceding 50 tokens. That's different from actually being able to recite any substantial portion of the book. If you asked Llama for that, the output would quickly diverge from the original text, and it likely wouldn't be able to get back on track without being re-prompted from the ground truth as the study did.

On the other hand, in the case where the New York Times is suing OpenAI, the NYT has alleged that ChatGPT was able to recite extensive portions of NYT articles verbatim. If true, this might be more dangerous, since news articles are not as long as books but they're equally eligible for copyright protection. So we'll see how that shakes out.

Also note:

- Nothing in the opinion sets formal precedent because it's a district court. But the opinion might still influence later judges.

- See also riskable's sibling comment for another case where a judge addressed the issue more head-on (but wasn't facing the same kind of detailed allegations, I don't think; haven't checked).

[1] https://arxiv.org/abs/2412.06370

>>UltraS+jz
>>44369227

If the US makes it illegal to train LLMs on copyrighted data, the US will find a solution and not just give up and wait half a decade to see what China does in the meantime.

>>moose4+(OP)
I'm surprised we never discuss a previous case of how governments handled a valuable new technology that challenged creative's ability to monetise their work:

Cassette Tapes and Private Copying Levy.

https://en.wikipedia.org/wiki/Private_copying_levy

Governments didn't ban tapes but taxed them and fed the proceeds back into the royalty system. An equivalent for books might be an LLM tax funding a negative tax rate for sold books e.g. earn $5 and the gov tops it up. Can't imagine how to ensure it was fair though.

Alternatively, might be an interesting math problem to calculate royalties for the training data used in each user request!

>>thinki+HA
Which is amusing because NYTimes has fought in court a few times in favour of technology progress over copyright. Including recently when they got sued over collected a bunch of freelance writing into a database without consent. https://harvardlawreview.org/blog/2024/04/nyt-v-openai-the-t...

I doubt the exact replica stuff will stand, as technically it was only achievable via advanced prompt engineering (hacking), not simply asking for a replica. So their 2 other arguments boils down to scraping a news database = infringement and LLM output = derivative works.

>>fallin+pU
> Says who?

Artists.

https://en.wikipedia.org/wiki/SAG-AFTRA

> How on earth are those things mutually exclusive?

Put those on a spectrum and rethink what I said.

> completely irrelevant to whether or not it is copyright infringement

_Again_, leave aside law minutiae and hypotheticals.

>>bonobo+821
A quick Google search will reveal that this not the case. Summaries of books or movies have no particular legal protection and the authors of those summaries may be sued by the owners of that content.

https://1minutebook.com/are-book-summaries-legal/

Fair use is a defense often cited in those cases but it's just that: a defense. Cliff Notes are often cited here but they actually license the content in many cases.

>>algane+yW
> > Says who?

> Artists.

> https://en.wikipedia.org/wiki/SAG-AFTRA

Do you have a link that has their stance on how AI is harming culture? The best I could find is https://www.sagaftra.org/contracts-industry-resources/member...

I can't find anything in there or its linked articles about culture. I do find quite a bit about synthetic performers and digital replicas and making sure that people who do voice acting don't have their performance used to generate material that is done at a discounted rate and doesn't reimburse the performer.

https://www.sagaftra.org/ongoing-fight-ai-protections-makes-...

> Protective A.I. guardrails for actors who work in video games remain a point of contention in the Interactive Media Agreement negotiations which have been ongoing from October 2022 until last month’s strike. Other A.I.-related panels Crabtree-Ireland participated in included a U.S. Department of Justice and Stanford University co-hosted event about promoting competition in A.I., as well as a Vanderbilt University summit on music law and generative A.I. SAG-AFTRA Executive Vice President Linda Powell discussed the interactive negotiations and A.I.’s many implications for creatives during her keynote speech at an Art in the Age of A.I. symposium put on by Villa Albertine at the French Embassy.

> She said A.I. represents “a turning point in our culture,” adding, “I think it’s important that we be participants in it and not passengers in it ... We need to make our voices known to the handful of people who are building and profiting off of this brave new world.”

This doesn't indicate that its good or bad, but rather that they want to make sure that people are in control of it and people are compensated for the works that are created from their performance.

>>rasz+RJ1
Full ruling is here (https://storage.courtlistener.com/recap/gov.uscourts.cand.43...)

The analogy the judge gives is to how Google Books walked the tightrope on copyright: they maintain an archive of all the books for indexing and search purposes, and can display excerpts to help you confirm that's what you're looking for. The excerpts are constrained so you can't read the whole book by scanning the excerpts.

If post-filtering the LLM signal is illegal, shouldn't Google Books archive also be illegal? If not, why not?

And if you believe it should be, understand that the way precedent works, the judge won't be ruling that way without pulling some fire on themselves, because it is not the business of another case to contradict the conclusions of a previous court in a previous case. Copyright law is arbitrary and highly path-dependent because the underlying goal is forever in tension with itself, that goal being providing societal benefit by creating artificial scarcity on something that is, by its nature, not scarce at all.

(Worth noting: Anthropic didn't get off scot-free. The ruling was that the created artifact, the LLM, was a fair-use product, but the way it was created was through massive piracy and Anthropic is liable for that copying).

>>fallin+OV
Your take on how copyright infringement works only counts for unregistered copyrights. If the copyrighted works are registered with the copyright office statutory damages apply:

https://www.law.cornell.edu/uscode/text/17/504

zlacker

A federal judge sides with Anthropic in lawsuit over training AI on books