zlacker

[return to "Anthropic destroyed millions of print books to build its AI models"]
1. JohnFe+x3[view] [source] 2025-06-25 21:40:20
>>bayind+(OP)
> In the process, the company cut millions of print books from their bindings, scanned them into digital files, and threw away the originals solely for the purpose of training AI

Oh boy. The more I learn about how genAI companies work, the more detestable they appear to be.

◧◩
2. Throwa+P6[view] [source] 2025-06-25 22:10:34
>>JohnFe+x3
You got suckered by the clickbait. Destructive scanning (https://en.wikipedia.org/wiki/Book_scanning#Destructive_scan...) isn't unusual for books that are common enough that an individual volume is of no particular value.
◧◩◪
3. bayind+68[view] [source] 2025-06-25 22:26:14
>>Throwa+P6
I mean, they could have gotten e-book versions of the books, or even preprint PDFs.

In an era where people are starting to calculate the environmental impact of the jobs they run on the cloud and start to optimize it, adding that much load on recycling system is not a wise choice, but only a selfish one.

◧◩◪◨
4. AlotOf+aa[view] [source] 2025-06-25 22:45:11
>>bayind+68
I strongly suspect that dealing with ebooks on this scale might actually be even more onerous than the physical volumes.

The physical stuff is straightforward. Buy books from bulk sellers, rip off everything and put them into off-the-self rigs for digitization. It's straightforward, directly scalable, can use any book, and your main issue is format shifting, which anthropic successfully argued here. No DRM, you buy exactly the books you need, and every book is processed exactly the same way.

If you try to buy ebooks, you get wrapped up in onerous licensing terms about copying, and how you're able to use them, how long you're able to access them, and so on. Many books won't even be available (or can only be licensed alongside a bunch of others) and you have to deal with DRM you can't strip without creating additional copyright issues.

We've somehow created a world where physical objects are more free than bits.

[go to top]