zlacker

[parent] [thread] 15 comments
1. Throwa+(OP)[view] [source] 2025-06-25 22:10:34
You got suckered by the clickbait. Destructive scanning (https://en.wikipedia.org/wiki/Book_scanning#Destructive_scan...) isn't unusual for books that are common enough that an individual volume is of no particular value.
replies(2): >>bayind+h1 >>JohnFe+y3
2. bayind+h1[view] [source] 2025-06-25 22:26:14
>>Throwa+(OP)
I mean, they could have gotten e-book versions of the books, or even preprint PDFs.

In an era where people are starting to calculate the environmental impact of the jobs they run on the cloud and start to optimize it, adding that much load on recycling system is not a wise choice, but only a selfish one.

replies(3): >>Throwa+P1 >>AlotOf+l3 >>rpdill+3m
◧◩
3. Throwa+P1[view] [source] [discussion] 2025-06-25 22:33:02
>>bayind+h1
I'm sure they would have loved to save the hassle and expense of disassembling physical books. Presumably something legal related or cost related prevented them from going that route.
replies(1): >>JohnFe+I3
◧◩
4. AlotOf+l3[view] [source] [discussion] 2025-06-25 22:45:11
>>bayind+h1
I strongly suspect that dealing with ebooks on this scale might actually be even more onerous than the physical volumes.

The physical stuff is straightforward. Buy books from bulk sellers, rip off everything and put them into off-the-self rigs for digitization. It's straightforward, directly scalable, can use any book, and your main issue is format shifting, which anthropic successfully argued here. No DRM, you buy exactly the books you need, and every book is processed exactly the same way.

If you try to buy ebooks, you get wrapped up in onerous licensing terms about copying, and how you're able to use them, how long you're able to access them, and so on. Many books won't even be available (or can only be licensed alongside a bunch of others) and you have to deal with DRM you can't strip without creating additional copyright issues.

We've somehow created a world where physical objects are more free than bits.

5. JohnFe+y3[view] [source] 2025-06-25 22:46:38
>>Throwa+(OP)
I didn't get suckered by anything. I'm aware of the practice. I find it objectionable. That they did this is just another thing on the growing list of objectionable things that genAI companies seem to enjoy doing.

To be honest, I probably wouldn't have even commented on it if it were the only bad thing these companies do.

replies(2): >>rpdill+Ol >>Captai+Cz
◧◩◪
6. JohnFe+I3[view] [source] [discussion] 2025-06-25 22:47:24
>>Throwa+P1
Yes, they did it as a workaround for copyright. TFA explains that aspect.
replies(1): >>rpdill+hm
◧◩
7. rpdill+Ol[view] [source] [discussion] 2025-06-26 02:09:50
>>JohnFe+y3
It was only legal because they did it this way.

> Ultimately, Judge William Alsup ruled that this destructive scanning operation qualified as fair use—but only because Anthropic had legally purchased the books first, destroyed each print copy after scanning, and kept the digital files internally rather than distributing them. The judge compared the process to "conserv[ing] space" through format conversion and found it transformative.

Very laws that the publishing industry has lobbied so heavily to make so strict are the reasons for this behavior.

◧◩
8. rpdill+3m[view] [source] [discussion] 2025-06-26 02:11:57
>>bayind+h1
No, they probably couldn't have. eBooks are notoriously DRMed and the DMCA makes it illegal to circumvent an effective copy protection mechanism even if you otherwise have legal access to work. Furthermore, first sale doctrine doesn't apply to any digital files and they can't be obtained legally in bulk.
◧◩◪◨
9. rpdill+hm[view] [source] [discussion] 2025-06-26 02:14:54
>>JohnFe+I3
It's not a workaround for copyright. It's to obey copyright. As in: copyright law is the reason they destroyed the books.

Meta didn't have to do any of this. They just used The Pile.

◧◩
10. Captai+Cz[view] [source] [discussion] 2025-06-26 05:36:59
>>JohnFe+y3
If you believe that destroying books is bad, your issue is with copyright law, not the AI companies. The AI companies are just following copyright law -- they are allowed to move data from one format to another (thereby destroying the original), but not copy it.
replies(4): >>baobun+qA >>rasz+NH >>JohnFe+li1 >>justin+aY1
◧◩◪
11. baobun+qA[view] [source] [discussion] 2025-06-26 05:47:27
>>Captai+Cz
Not everything objectionable or unethical should or could necessarily be outlawed. "It's not illegal" is not really an argument or justification for anything.
replies(1): >>Ukv+fF1
◧◩◪
12. rasz+NH[view] [source] [discussion] 2025-06-26 07:13:01
>>Captai+Cz
Specifically his issue is with First Sale doctrine. If you own it you can destroy it and its none of anyone else's business.
replies(1): >>JohnFe+si1
◧◩◪
13. JohnFe+li1[view] [source] [discussion] 2025-06-26 13:31:40
>>Captai+Cz
> If you believe that destroying books is bad, your issue is with copyright law, not the AI companies

No, my issue is with the companies that do this. The law doesn't enter into it. Just because a thing is legal doesn't mean it's OK.

◧◩◪◨
14. JohnFe+si1[view] [source] [discussion] 2025-06-26 13:32:21
>>rasz+NH
I don't have an issue with the first sale doctrine. It's an important property right.

That doesn't mean I support everything that people have a right to do with their property.

◧◩◪◨
15. Ukv+fF1[view] [source] [discussion] 2025-06-26 15:59:55
>>baobun+qA
I don't think CaptainFever's point is that it's acceptable because it's legal, but rather that copyright law is what prevents them from, say, donating the originals instead of throwing them away.
◧◩◪
16. justin+aY1[view] [source] [discussion] 2025-06-26 18:13:02
>>Captai+Cz
I very much have a problem with both of these things.
[go to top]