That’s going to be hard to argue. Where are the copies?
“Having copied the five billion images—without the consent of the original artists—Stable Diffusion relies on a mathematical process called diffusion to store compressed copies of these training images, which in turn are recombined to derive other images. It is, in short, a 21st-century collage tool.“
“Diffusion is a way for an AI program to figure out how to reconstruct a copy of the training data through denoising. Because this is so, in copyright terms it’s no different from an MP3 or JPEG—a way of storing a compressed copy of certain digital data.”
The examples of training diffusion (eg, reconstructing a picture out of noise) will be core to their argument in court. Certainly during training the goal is to reconstruct original images out of noise. But, do they exist in SD as copies? Idk
In fairness, Diffusion is arguably a very complex entropy coding similar to Arithmetic/Huffman coding.
Given that copyright is protectable even on compressed/encrypted files, it seems fair that the “container of compressed bytes” (in this case the Diffusion model) does “contain” the original images no differently than a compressed folder of images contains the original images.
A lawyer/researcher would likely win this case if they re-create 90%ish of a single input image from the diffusion model with text input.
There's a world of difference that you are just writing off.
Which of course then arrives at the problem: the original data plainly isn't stored in a byte exact form, and you can only recover it by providing an astounding specific input string (the 512 bit latent space vector). But that's not data which is contained within Stable Diffusion. It's equivalent to trying to sue a compression codec because a specific archive contains a copyrighted image.
That's plainly untrue, as Stable Diffusion is not just the algorithm, but the trained model—trained on millions of copyrighted images.
SD might know how to violate copyright but is that enough to sue it? Or can you only sue violations it helps create?
That’s said, it does raise the question, “should this precedent be extended to humans?”
i.e. Can humans be taught something based on copyrighted materials in the training set/curriculum?
To address (b) first: Fair Use has long held that educational purposes are a valid reason for using copyrighted materials without express permission—for instance, showing a whole class a VHS or DVD, which would technically require a separate release otherwise.
For (a): I don't know anything about your background in ML, so pardon if this is all obvious, but at least current neural nets and other ML programs are not "AI" in anything like the kind of sense where "teaching" is an apt word to describe the process of creating the model. Certainly the reasoning behind the Fair Use exception for educating humans does not apply—there is no mind there to better; no person to improve the life, understanding, or skills of.