In an era where people are starting to calculate the environmental impact of the jobs they run on the cloud and start to optimize it, adding that much load on recycling system is not a wise choice, but only a selfish one.
The physical stuff is straightforward. Buy books from bulk sellers, rip off everything and put them into off-the-self rigs for digitization. It's straightforward, directly scalable, can use any book, and your main issue is format shifting, which anthropic successfully argued here. No DRM, you buy exactly the books you need, and every book is processed exactly the same way.
If you try to buy ebooks, you get wrapped up in onerous licensing terms about copying, and how you're able to use them, how long you're able to access them, and so on. Many books won't even be available (or can only be licensed alongside a bunch of others) and you have to deal with DRM you can't strip without creating additional copyright issues.
We've somehow created a world where physical objects are more free than bits.
To be honest, I probably wouldn't have even commented on it if it were the only bad thing these companies do.
> Ultimately, Judge William Alsup ruled that this destructive scanning operation qualified as fair use—but only because Anthropic had legally purchased the books first, destroyed each print copy after scanning, and kept the digital files internally rather than distributing them. The judge compared the process to "conserv[ing] space" through format conversion and found it transformative.
Very laws that the publishing industry has lobbied so heavily to make so strict are the reasons for this behavior.
Meta didn't have to do any of this. They just used The Pile.
No, my issue is with the companies that do this. The law doesn't enter into it. Just because a thing is legal doesn't mean it's OK.
That doesn't mean I support everything that people have a right to do with their property.