I don’t necessarily fault OpenAI’s decision to initially train their models without entering into licensing agreements - they probably wouldn’t exist and the generative AI revolution may never have happened if they put the horse before the cart. I do think they should quickly course correct at this point and accept the fact that they clearly owe something to the creators of content they are consuming. If they don’t, they are setting themselves up for a bigger loss down the road and leaving the door open for a more established competitor (Google) to do it the right way.
This is also related to earlier studies about OpenAI where their models have a bad habit of just regurgitating training data verbatim. If your trained data is protected IP you didn’t secure the rights for then that’s a real big problem. Hence this lawsuit. If successful, the floodgates will open.
Is using something, in its entirety, as a tiny bit of a massive data set, in order to produce something novel... infringing?
That's a pretty weird question that never existed when copyright was defined.