We’ve filed a lawsuit challenging Stable Diffusion

>>zacwes+(OP)
“Stable Diffusion contains unauthorized copies of millions—and possibly billions—of copyrighted images.”

That’s going to be hard to argue. Where are the copies?

“Having copied the five billion images—without the consent of the original artists—Stable Diffusion relies on a mathematical process called diffusion to store compressed copies of these training images, which in turn are recombined to derive other images. It is, in short, a 21st-century collage tool.“

“Diffusion is a way for an AI program to figure out how to reconstruct a copy of the training data through denoising. Because this is so, in copyright terms it’s no different from an MP3 or JPEG—a way of storing a compressed copy of certain digital data.”

The examples of training diffusion (eg, reconstructing a picture out of noise) will be core to their argument in court. Certainly during training the goal is to reconstruct original images out of noise. But, do they exist in SD as copies? Idk

>>dr_dsh+12
> That’s going to be hard to argue. Where are the copies?

In fairness, Diffusion is arguably a very complex entropy coding similar to Arithmetic/Huffman coding.

Given that copyright is protectable even on compressed/encrypted files, it seems fair that the “container of compressed bytes” (in this case the Diffusion model) does “contain” the original images no differently than a compressed folder of images contains the original images.

A lawyer/researcher would likely win this case if they re-create 90%ish of a single input image from the diffusion model with text input.

>>yazadd+X3
Great. Now the defence shows an artist that can recreate an image. Cool, now people who look at images get copyright suits filed against them for encoding those images in their heads.

>>anothe+96
Just because I look at an image does not mean that I can recreate it. storing it in the training data means the AI can recreate it.

There's a world of difference that you are just writing off.

>>dylan6+07
No, it means there is a 512 bit number you can combine with the training data to reproduce a reasonable though not exact likeness (attempts to use SD and others as compression algorithms show they're pretty bad at it, because while they can get "similar" they'll outright confabulate details in a plausible looking way - i.e. redrawing the streets of San Francisco in images of the golden gate bridge).

Which of course then arrives at the problem: the original data plainly isn't stored in a byte exact form, and you can only recover it by providing an astounding specific input string (the 512 bit latent space vector). But that's not data which is contained within Stable Diffusion. It's equivalent to trying to sue a compression codec because a specific archive contains a copyrighted image.

>>XorNot+99
> It's equivalent to trying to sue a compression codec because a specific archive contains a copyrighted image.

That's plainly untrue, as Stable Diffusion is not just the algorithm, but the trained model—trained on millions of copyrighted images.

>>danari+SK
But in fairness, even a human could know how to violate copyright but cannot be sued until they do violate it.

SD might know how to violate copyright but is that enough to sue it? Or can you only sue violations it helps create?

>>yazadd+031
I would assert (with no legal backing, since this is the first suit that actually attempts to address the issue either way) that the trained model is a copyright infringement in itself. It is a novel kind of copyright infringement, to be sure, but I believe that use of copyrighted material in a neural net's training set without the creator's permission should be considered copyright infringement without any further act required to make it so.

>>danari+381
I think that is a very fair argument. It may win in court it may lose. I’m excited for the precedent either way.

That’s said, it does raise the question, “should this precedent be extended to humans?”

i.e. Can humans be taught something based on copyrighted materials in the training set/curriculum?

>>yazadd+oN1
I think this is a reasonable question for the uninitiated—those for whom "training a neural network" seems like it would be a lot like "teaching a human"—but for those with deeper understanding (tbh, I would only describe my knowledge in both these areas as that of an interested amateur), it is a) a poor analogy, and b) already a settled question in law.

To address (b) first: Fair Use has long held that educational purposes are a valid reason for using copyrighted materials without express permission—for instance, showing a whole class a VHS or DVD, which would technically require a separate release otherwise.

For (a): I don't know anything about your background in ML, so pardon if this is all obvious, but at least current neural nets and other ML programs are not "AI" in anything like the kind of sense where "teaching" is an apt word to describe the process of creating the model. Certainly the reasoning behind the Fair Use exception for educating humans does not apply—there is no mind there to better; no person to improve the life, understanding, or skills of.

zlacker