Oh boy. The more I learn about how genAI companies work, the more detestable they appear to be.
In an era where people are starting to calculate the environmental impact of the jobs they run on the cloud and start to optimize it, adding that much load on recycling system is not a wise choice, but only a selfish one.
The physical stuff is straightforward. Buy books from bulk sellers, rip off everything and put them into off-the-self rigs for digitization. It's straightforward, directly scalable, can use any book, and your main issue is format shifting, which anthropic successfully argued here. No DRM, you buy exactly the books you need, and every book is processed exactly the same way.
If you try to buy ebooks, you get wrapped up in onerous licensing terms about copying, and how you're able to use them, how long you're able to access them, and so on. Many books won't even be available (or can only be licensed alongside a bunch of others) and you have to deal with DRM you can't strip without creating additional copyright issues.
We've somehow created a world where physical objects are more free than bits.