zlacker

[parent] [thread] 44 comments
1. ramon1+(OP)[view] [source] 2025-07-07 10:42:11
Pirate and pay the fine is probably hell of a lot cheaper than individually buying all these books. I'm not saying this is justified, but what would you have done in their situation?

Sayi "they have the money" is not an argument. It's about the amount of effort that is needed to individually buy, scan, process millions of pages. If that's done for you, why re-do it all?

replies(10): >>Timoro+i1 >>pyman+E1 >>glimsh+b2 >>keving+u4 >>maeln+n7 >>suyjur+i8 >>darkoo+Hg >>tmaly+nq >>bmitc+fJ >>blibbl+nO
2. Timoro+i1[view] [source] 2025-07-07 10:55:07
>>ramon1+(OP)
150K per work is the maximum fine for willful infringement (which this is).

105B+ is more than Anthropic is worth on paper.

Of course they’re not going to be charged to the fullest extent of the law, they’re not a teenager running Napster in the early 2000s.

replies(3): >>voxic1+Zp >>eikenb+e91 >>dragon+I91
3. pyman+E1[view] [source] 2025-07-07 10:59:43
>>ramon1+(OP)
The problem with this thinking is that hundreds of thousands of teachers who spent years writing great, useful books and sharing knowledge and wisdom probably won't sue a billion dollar company for stealing their work. What they'll likely do is stop writing altogether.

I'm against Anthropic stealing teacher's work and discouraging them from ever writing again. Some teachers are already saying this (though probably not in California).

replies(5): >>lofasz+i5 >>Curiou+Q6 >>glimsh+Q7 >>NoMore+zP >>js8+951
4. glimsh+b2[view] [source] 2025-07-07 11:03:08
>>ramon1+(OP)
Isn't "pirating" a felony with jail time, though? That's what I remember from the FBI warning I had to see at the beginning of every DVD I bought (but not "pirated" ones).
replies(1): >>voxic1+vq
5. keving+u4[view] [source] 2025-07-07 11:21:04
>>ramon1+(OP)
Google did it the legal way with Google Books, didn't they?
◧◩
6. lofasz+i5[view] [source] [discussion] 2025-07-07 11:26:58
>>pyman+E1
They won't be needed anymore, once singularity is reached. This might be their thought process. This also exemplifies that the loathed caste system found in India is indeed in place in western societies.

There is no equality, and seemingly there are worker bees who can be exploited, and there are privileged ones, and of course there are the queens.

replies(2): >>pyman+i6 >>Sketch+su
◧◩◪
7. pyman+i6[view] [source] [discussion] 2025-07-07 11:32:52
>>lofasz+i5
:D

Note: My definition of singularity isn't the one they use in San Francisco. It's the moment founders who stole the life's work of thousands of teachers finally go to prison, and their datacentres get seized.

replies(1): >>lofasz+u8
◧◩
8. Curiou+Q6[view] [source] [discussion] 2025-07-07 11:36:48
>>pyman+E1
If you care so little about writing that AI puts you off it, TBH you're probably not a great writer anyhow.

Writers that have an authentic human voice and help people think about things in a new way will be fine for a while yet.

replies(1): >>4b11b4+fl
9. maeln+n7[view] [source] 2025-07-07 11:41:55
>>ramon1+(OP)
If you wanted to be legit with 0 chance of going to court, you would contact publisher and ask to pay a license to get access to their catalog for training, and negotiate from that point.

This is what every company using media are doing (think Spotify, Netflix, but also journal, ad agency, ...). I don't know why people in HN are giving a pass to AI company for this kind of behavior.

replies(2): >>ohashi+Uw >>Captai+9e1
◧◩
10. glimsh+Q7[view] [source] [discussion] 2025-07-07 11:44:24
>>pyman+E1
That will be sad, although there will still be plenty of great people who will write books anyway.

When it comes to a lot of these teachers, I'll say, copyright work hand in hand with college and school course book mandates. I've seen plenty of teachers making crazy money off students' backs due to these mandates.

A lot of the content taught in undergrad and school hasn't changed in decades or even centuries. I think we have all the books we'll ever need in certain subjects already, but copyright keeps enriching people who write new versions of these.

11. suyjur+i8[view] [source] 2025-07-07 11:46:53
>>ramon1+(OP)
Just downloading them is of course cheaper, but it is worth pointing out that, as the article states, they did also buy legitimate copies of millions of books. (This includes all the books involved in the lawsuit.) Based on the judgement itself, Anthropic appears to train only on the books legitimately acquired. Used books are quite cheap, after all, and can be bought in bulk.
replies(1): >>asadot+JF
◧◩◪◨
12. lofasz+u8[view] [source] [discussion] 2025-07-07 11:48:10
>>pyman+i6
You can bet that this never gonna happen...
replies(1): >>coverc+ox
13. darkoo+Hg[view] [source] 2025-07-07 12:48:15
>>ramon1+(OP)
This is not about paying for a single copy. It would still be wrong even if they have bought every single one of those books. It is a form of plagiarism. The model will use someone else's idea without proper attribution.
replies(1): >>jeroen+Vy
◧◩◪
14. 4b11b4+fl[view] [source] [discussion] 2025-07-07 13:21:15
>>Curiou+Q6
Yeah, people will still want to write. They might need new ways to monetize it... that being said, even if people still want to write they may not consider it a viable path. Again, have to consider other monetization.
◧◩
15. voxic1+Zp[view] [source] [discussion] 2025-07-07 13:55:11
>>Timoro+i1
Even if they don't qualify for willful infringement damages (lets say they have a good faith belief their infringement was covered by fair use) the standard statutory damages for copyright infringement are $750-$30,000 per work.
16. tmaly+nq[view] [source] 2025-07-07 13:57:19
>>ramon1+(OP)
At minimum they should have to buy the book they are deriving weights from.
replies(1): >>SirMas+uP
◧◩
17. voxic1+vq[view] [source] [discussion] 2025-07-07 13:57:51
>>glimsh+b2
Yes criminal copyright infringement (willful copyright infringement done for commercial gain or at a large scale) is a felony.
◧◩◪
18. Sketch+su[view] [source] [discussion] 2025-07-07 14:23:08
>>lofasz+i5
> They won't be needed anymore, once singularity is reached.

And it just so happens that that belief says they can burn whatever they want down because something in the future might happen that absolves them of those crimes.

◧◩
19. ohashi+Uw[view] [source] [discussion] 2025-07-07 14:38:20
>>maeln+n7
Because they are mostly software developers who think it's different because it impacts them.
◧◩◪◨⬒
20. coverc+ox[view] [source] [discussion] 2025-07-07 14:41:13
>>lofasz+u8
When the rich and powerful face zero consequences for breaking laws and ignoring the social contracts that keep our society functioning, you wind up with extreme overcorrections. See Luigi.
replies(1): >>achier+8H
◧◩
21. jeroen+Vy[view] [source] [discussion] 2025-07-07 14:49:32
>>darkoo+Hg
Legally speaking, we don't know that yet. Early signs are pointing at judges allowing this kind of crap because it's almost impossible for most authors to point out what part of the generated slop was originally theirs.
◧◩
22. asadot+JF[view] [source] [discussion] 2025-07-07 15:29:11
>>suyjur+i8
Buying a book is not license to re-sell that content for your own profit. I can't buy a copy of your book, make a million Xeroxes of it and sell those. The license you get when you buy a book is for a single use, not a license to do what ever you want with the contents of that book.
replies(2): >>thedev+QP >>suyjur+YR
◧◩◪◨⬒⬓
23. achier+8H[view] [source] [discussion] 2025-07-07 15:38:14
>>coverc+ox
How extreme is that, really? Not to justify murder: that is clearly bad. But "killing one man" is evidently something we, as a society, consider an "acceptable side-effect" when a corporation does it -- hell, you can kill thousands and get away scot-free if you're big enough.

Luigi was peanuts in comparison.

“THERE were two “Reigns of Terror,” if we would but remember it and consider it; the one wrought murder in hot passion, the other in heartless cold blood; the one lasted mere months, the other had lasted a thousand years; the one inflicted death upon ten thousand persons, the other upon a hundred millions; but our shudders are all for the “horrors” of the minor Terror, the momentary Terror, so to speak; whereas, what is the horror of swift death by the axe, compared with lifelong death from hunger, cold, insult, cruelty, and heart-break? What is swift death by lightning compared with death by slow fire at the stake? A city cemetery could contain the coffins filled by that brief Terror which we have all been so diligently taught to shiver at and mourn over; but all France could hardly contain the coffins filled by that older and real Terror—that unspeakably bitter and awful Terror which none of us has been taught to see in its vastness or pity as it deserves.”

- Mark Twain

24. bmitc+fJ[view] [source] 2025-07-07 15:49:50
>>ramon1+(OP)
> I'm not saying this is justified, but what would you have done in their situation?

Individuals would have their lives ruined either from massive fines or jail time.

25. blibbl+nO[view] [source] 2025-07-07 16:22:06
>>ramon1+(OP)
> Pirate and pay the fine is probably hell of a lot cheaper than individually buying all these books.

$500,000 per infringement...

replies(1): >>jandre+321
◧◩
26. SirMas+uP[view] [source] [discussion] 2025-07-07 16:28:29
>>tmaly+nq
But should the purchase be like a personal license? Or like a commercia license that costs way more?

Because for example if you buy a movie on disc, that's a personal license and you can watch it yourself at home. But you can't like play it at a large public venue that sell tickets to watch it. You need a different and more expensive license to make money off the usage of the content in a larger capacity like that.

replies(1): >>tmaly+Z49
◧◩
27. NoMore+zP[view] [source] [discussion] 2025-07-07 16:28:54
>>pyman+E1
Stealing? In what way?

Training a generative model on a book is the mechanical equivalent of having a human read the book and learn from it. Is it stealing if a person reads the book and learns from it?

replies(3): >>blocko+q91 >>janals+uS1 >>coffee+VU1
◧◩◪
28. thedev+QP[view] [source] [discussion] 2025-07-07 16:30:11
>>asadot+JF
What are you on about - the judge has literally said this was not resell, and is transformative and fair use.
◧◩◪
29. suyjur+YR[view] [source] [discussion] 2025-07-07 16:44:24
>>asadot+JF
Yes, of course! In this case, the judge identified three separate instances of copying: (1) downloading books without authorisation to add to their internal library, (2) scanning legitimately purchased books to add to their internal library, and (3) taking data from their internal library for the purposes of training LLMs. The purchasing part is only relevant for (2) — there the judge ruled that this is fair use. This makes a lot of sense to me, since no additional copies were created (they destroyed the physical books after scanning), so this is just a single use, as you say. The judge also ruled that (3) is fair use, but for a different reason. (They declined to decide whether (1) is fair use at this point, deferring to a later trial.)
◧◩
30. jandre+321[view] [source] [discussion] 2025-07-07 17:39:13
>>blibbl+nO
And the crazy thing is that might be cheaper when you consider the alternative is to have your lawyers negotiate with the lawyers for the publishing companies for the right to use the works as training data. Not only is it many many billable hours just to draw up the contract, but you can be sure that many companies would either not play ball or set extremely high rates. Finally, if the publishing companies did bring a suit against Anthropic they might be asked to prove each case of infringement, basically to show that a specific work was used in training, which might be difficult since you can't reverse a model to get the inputs. When you're a billion dollar company it's much easier to get the courts to take your side. This isn't like the music companies suing teenagers who had a Kazaa account.
◧◩
31. js8+951[view] [source] [discussion] 2025-07-07 17:55:59
>>pyman+E1
> The problem with this thinking is that hundreds of thousands of teachers who spent years writing great, useful books and sharing knowledge and wisdom probably won't sue a billion dollar company for stealing their work. What they'll likely do is stop writing altogether.

I think this is a fantasy. My father cowrote a Springer book about physics. For the effort, he got like $400 and 6 author copies.

Now, you might say he got a bad deal (or the book was bad), but I don't think hundreds of thousands of authors do significantly better. The reality is, people overwhelmingly write because they want to, not because of money.

replies(1): >>pyman+3l2
◧◩
32. eikenb+e91[view] [source] [discussion] 2025-07-07 18:20:37
>>Timoro+i1
Plus they did it with a profit motive which would entail criminal proceedings.
◧◩◪
33. blocko+q91[view] [source] [discussion] 2025-07-07 18:22:18
>>NoMore+zP
Depends on how closely that person can reproduce the original work without license or attribution
replies(1): >>lcnPyl+od1
◧◩
34. dragon+I91[view] [source] [discussion] 2025-07-07 18:24:15
>>Timoro+i1
> 150K per work is the maximum fine for willful infringement

No, its not.

It's the maximum statutory damages for willful infringement, which this has not be adjudicated to be. it is not a fine, its an alternative to basis of recovery to actual damages + infringers profits attributable to the infringement.

Of course, there's also a very wide range of statutory damages, the minimum (if it is not "innocent" infringement) is $750/work.

> 105B+ is more than Anthropic is worth on paper.

The actual amount of 7 million works times $150,000/work is $1.05 trillion, not $105 billion.

replies(1): >>Timoro+qc1
◧◩◪
35. Timoro+qc1[view] [source] [discussion] 2025-07-07 18:41:51
>>dragon+I91
> It's the maximum statutory damages for willful infringement, which this has not be adjudicated to be. it is not a fine, its an alternative to basis of recovery to actual damages + infringers profits attributable to the infringement.

Yeah, you’re probably right, I’m not a lawyer. The point is that it doesn’t matter what number the law says they should pay, Anthropic can afford real lawyers and will therefore only pay a pittance, if anything.

I’m old enough to remember what the feds did to Aaron Schwarz, and I don’t see what Anthropic did that was so different, ethically speaking.

◧◩◪◨
36. lcnPyl+od1[view] [source] [discussion] 2025-07-07 18:48:50
>>blocko+q91
It actually depends on whether or not they reproduce it and especially what they do with the copy after making it.
replies(1): >>blocko+SC3
◧◩
37. Captai+9e1[view] [source] [discussion] 2025-07-07 18:53:57
>>maeln+n7
> I don't know why people in HN are giving a pass to AI company for this kind of behavior.

As mentioned in The Fucking Article, there's a legal difference between training an AI which largely doesn't repeat things verbatim (ala Anthropic) and redistributing media as a whole (ala Spotify, Netflix, journal, ad agency).

◧◩◪
38. janals+uS1[view] [source] [discussion] 2025-07-08 01:01:14
>>NoMore+zP
> In what way?

Downloading the book without paying for it, which is more or less what the judge said.

◧◩◪
39. coffee+VU1[view] [source] [discussion] 2025-07-08 01:33:18
>>NoMore+zP
But a language model is not a person, it’s a copy machine with a blender inside.

Photocopying books in their entirety for commercial use is absolutely illegal.

◧◩◪
40. pyman+3l2[view] [source] [discussion] 2025-07-08 07:11:02
>>js8+951
I see where you are coming from: "My 8-yo son can also build websites".

Writing books is a profession.

Some people write full-time and make a living from it, through book sales, speaking gigs, teaching, or other related work.

Maybe ask Tim O’Reilly what he thinks about this so-called fantasy.

Like I said, Anthropic needs to stop stealing books or face the consequences.

replies(1): >>js8+nF2
◧◩◪◨
41. js8+nF2[view] [source] [discussion] 2025-07-08 11:34:46
>>pyman+3l2
No you don't see where I am coming from. And my father was a university professor. I am certainly not opposed to authors being fairly remunerated for their work, that's why I brought up that example.

My point is, the controversy is not an AI corporation vs 10^5 ordinary teachers. It's a battle of two corporations, or business models, if you will. But regardless of the result, most of the book authors will continue to get screwed, maybe the means will change. But it will not prevent them from writing, either. So I don't see any mass writers protests coming, sorry.

I also don't think Anthropic AI is going to be any less intelligent if it didn't read any modern fiction book, instead of reading a Wikipedia summary. Stories and myths are a human way of understanding the world, machines probably don't need them. And for non-fiction books - there really isn't that many irreplaceable high-profile authors out there. If it can't read, say, Feynman's Lectures on Physics, it can learn the same from 100s of other physics textbooks. Maybe they are slightly worse organized but why should superintelligence care?

replies(1): >>greeni+MN6
◧◩◪◨⬒
42. blocko+SC3[view] [source] [discussion] 2025-07-08 18:35:24
>>lcnPyl+od1
Sure. I'd say reproducing and distributing it to someone who happens to ask the right questions would qualify
replies(1): >>lcnPyl+OR3
◧◩◪◨⬒⬓
43. lcnPyl+OR3[view] [source] [discussion] 2025-07-08 20:16:56
>>blocko+SC3
Well, right, but that's different from "can reproduce the original work". I "can" start typing out song lyrics but it doesn't mean that I stole the songs I've listened to.
◧◩◪◨⬒
44. greeni+MN6[view] [source] [discussion] 2025-07-09 21:34:40
>>js8+nF2
you are correct
◧◩◪
45. tmaly+Z49[view] [source] [discussion] 2025-07-10 17:57:44
>>SirMas+uP
I think it would have to be commercial since they are making profits from selling inference on the weights that derived from the books.
[go to top]