That’s going to be hard to argue. Where are the copies?
“Having copied the five billion images—without the consent of the original artists—Stable Diffusion relies on a mathematical process called diffusion to store compressed copies of these training images, which in turn are recombined to derive other images. It is, in short, a 21st-century collage tool.“
“Diffusion is a way for an AI program to figure out how to reconstruct a copy of the training data through denoising. Because this is so, in copyright terms it’s no different from an MP3 or JPEG—a way of storing a compressed copy of certain digital data.”
The examples of training diffusion (eg, reconstructing a picture out of noise) will be core to their argument in court. Certainly during training the goal is to reconstruct original images out of noise. But, do they exist in SD as copies? Idk
The law doesn't recognize a mathematical computer transformation as creating a new work with original copyright.
If you give me an image, and I encrypt it with a randomly generated password, and then don't write down the password anywhere, the resulting file will be indistinguishable from random noise. No one can possibly derive the original image from it. But, it's still copyrighted by the original artist as long as they can show "This started as my image, and a machine made a rote mathematical transformation to it" because machine's making rote mathematical transformations cannot create new copyright.
The argument for stable diffusion would be that even if you cannot point to any image, since only algorithmic changes happened to the inputs, without any human creativity, the output is a derived work which does not have its own unique copyright.
The idea, in stage one, was to split a file into chunks and xor those with other random chunks (equivalent to a one-time pad), those chunks as well as the created random chunks then got shared around the networks, with nobody hosting both parts of a pair.
The next stage is that future files inserted into the network would not create new random chunks but randomly use existing chunks already in the network. The result is a distributed store of chunks each of which is provably capable of generating any other chunk given the right pair. The correlations are then stored in a separate manifest.
It feels like such a system is some kind of entropy coding system. In the limit the manifest becomes the same size as the original data. At the same time though, you can prove that any given chunk contains no information. I love thinking about how the philosophy of information theory interacts with the law.
Yes, on a technical level, those chunks are random data. On the legal side, however, those chunks are illegal copyright infringement because that is their intent, and there is a process that allows the intent to happen.
I can't really say it better than this post does, so I highly recommend reading it: https://ansuz.sooke.bc.ca/entry/23
But that's not what people use Stable Diffusion for: people use Stable Diffusion to create new works which don't previously exist as that combination of colors/bytes/etc.
Artists don't have copyright on their artistic style, process, technique or subject matter - only on the actual artwork they output or reasonable similarities. But "reasonable similarity" covers exactly that intent - an intent to simply recreate the original.
People keep talking about copyright, but no one's trying to rip off actual existing work. They're doing things like "Pixar style, ultra detailed gundam in a flower garden". So you're rocking up in court saying "the intent is to steal my clients work" - but where is the clients line of gundam horticultural representations? It doesn't exist.
You can't copyright artistic style, only actual output. Artists are fearful that the ability to emulate style means commissions will dry up (this is true) but you've never had copyright protection over style, and it's not even remotely clear how that would work (and, IMO, it would be catastrophic if it was - there's exactly one group of megacorps who would now be in a position to sue everyone because try defining "style" in a legal sense).
Copyright infringement can happen without intending to infringe copyright.
Various music copyright cases start with "Artist X sampled some music from artist Y, thinking it was transformative and fair use". The court, in some of these cases, have found something the artist _intended_ to be transformative to in fact be copyright infringement.
> You can't copyright artistic style, only actual output
You copyright outputs, and then works that are derived from those outputs are potentially copyrighted. Stable Diffusion's outputs are clearly defined from the training set, basically by definition of what neural networks are.
It's less clear they're definitely copyright-infringing derivative works, but it's far less clearcut than how you're phrasing it.