This is OK and fair use: Training LLMs on copyrighted work, since it's transformative.
This is not OK and not fair use: pirating data, or creating a big repository of pirated data that isn't necessarily for AI training.
Overall seems like a pretty reasonable ruling?
Personally I like to frame most AI problems by substituting a human (or humans) for the AI. Works pretty well most of the time.
In this case if you hired a bunch of artists/writers that somehow had never seen a Disney movie and to train them to make crappy Disney clones you made them watch all the movies it certainly would be legal to do so but only if they had legit copies in the training room. Pirating the movies would be illegal.
Though the downside is it does create a training moat. If you want to create the super-brain AI that's conversant on the corpus of copyrighted human literature you're going to need a training library worth millions
How many copies? They're not serving a single client.
Libraries need to have multiple e-book licenses, after all.
It changes the definition of what a "legal copy" is but the general idea that the copy must be legal still stands.