It’s very unlikely simply training an LLM on “unlicensed” work constitutes infringement. It could possibly be that the model itself, when published, would represent a derivative work, but it’s unlikely that most output would be unless specifically prompted to be.
"Create a video of a girl running through a field in the style of Studio Ghibli."
There, someone has specifically prompted the AI to create something visually similar to X.
But would you still consider it a derivative work if you replaced the words "Studio Ghibli" with a few sentences describing their style that ultimately produces the same output?
This is why all the lobby now pushes the govs to not allow any regulation of AI even if courts disagree.
IMHO what will happen anyway is that at some point the companies will "solve" the licensing by training models purely on older synthetic LLM output that will be "public research" (which of course will have the "human" weights but they will claim it doesnt matter).
It’s important that copyright applies to copying/publishing/distributing - you can do whatever you to copyrighted works by yourself.
Of course, that still won’t make artists happy, because they think things like styles can be copyrighted, which isn’t true.
If we believe that authors should be able decide how their work is used then they can for sure say no machine learning. If we dont believe in intelectual property then anything is for grabs. I am ok with it but the corps are not.