With LLMs we have an aspect of a text corpus that the creators were not using (the language patterns) and had no plans for or even idea that it could be used, and then when someone comes along and uses it, not to reproduce anything but to provide minute iterative feedback in training, they run in to try and extract some money. It's parasitism. It doesn't benefit society, it only benefits the troll, there is no reason courts should enforce it.
Someone should try and show that a NYT article can be generated autoregressively and argue it's therefore not copyrightable.
You can get a little discombobulated reading the comments from the nerds / subject idiots on this site.
No piracy or even AI was required, here. Google's defense was that their product couldn't reproduce the book in it's entirety, which was proven and made the prosecution about Fair Use instead. Given that it was much harder to prosecute on those grounds, Google tried coercing the authors into a settlement before eventually the District Court dropped the case in Google's favor altogether.
OpenAI's lawyers are aware of the precedent on copyright law. They're going to argue their application is Fair Use, and they might get away with it.