zlacker

[return to "We’ve filed a law­suit chal­leng­ing Sta­ble Dif­fu­sion"]
1. dr_dsh+12[view] [source] 2023-01-14 07:17:25
>>zacwes+(OP)
“Sta­ble Dif­fu­sion con­tains unau­tho­rized copies of mil­lions—and pos­si­bly bil­lions—of copy­righted images.”

That’s going to be hard to argue. Where are the copies?

“Hav­ing copied the five bil­lion images—with­out the con­sent of the orig­i­nal artists—Sta­ble Dif­fu­sion relies on a math­e­mat­i­cal process called dif­fu­sion to store com­pressed copies of these train­ing images, which in turn are recom­bined to derive other images. It is, in short, a 21st-cen­tury col­lage tool.“

“Diffu­sion is a way for an AI pro­gram to fig­ure out how to recon­struct a copy of the train­ing data through denois­ing. Because this is so, in copy­right terms it’s no dif­fer­ent from an MP3 or JPEG—a way of stor­ing a com­pressed copy of cer­tain dig­i­tal data.”

The examples of training diffusion (eg, reconstructing a picture out of noise) will be core to their argument in court. Certainly during training the goal is to reconstruct original images out of noise. But, do they exist in SD as copies? Idk

◧◩
2. akjetm+D3[view] [source] 2023-01-14 07:36:22
>>dr_dsh+12
I don't think you have to reproduce an entire original work to demonstrate copyright violation. Think about sampling in hip hop for example. A 2 second sample, distorted, re-pitched, etc. can be grounds for a copyright violation.
◧◩◪
3. Salgat+R3[view] [source] 2023-01-14 07:41:38
>>akjetm+D3
The difference here is that the images aren't stored, but rather an extremely abstract description of the image was used to very slightly adjust a network of millions of nodes in a tiny direction. No semblance of the original image even remotely exists in the model.
◧◩◪◨
4. AlotOf+D7[view] [source] 2023-01-14 08:22:47
>>Salgat+R3
This is very much a 'color of your bits' topic, but I'm not sure why the internal representation matters. It's pretty trivial to recreate famous works like the Mona Lisa or Starry Night or Monet's Water Lily Pond. Obviously some representation of the originals exist inside the model+prompt. Why wouldn't that apply to other images in the training sets?
◧◩◪◨⬒
5. XorNot+x9[view] [source] 2023-01-14 08:42:38
>>AlotOf+D7
Because you're silently invoking additional data (the prompt + noise seed), which is not present in the training weights. You have the prompt + noise seed for any given output.

An MPEG codec doesn't contain every movie in the world just because it could represent them if given the right file.

The white light coming off a blank canvas also doesn't contain a copy of the Mona Lisa which will be revealed once someone obscures some of the light.

◧◩◪◨⬒⬓
6. ifdefd+0l[view] [source] 2023-01-14 10:57:19
>>XorNot+x9
OK so let me encrypt a movie and distribute that. Then you tell people they need to invoke additional data to watch the movie. Also give some hints (try the movie title lol).
◧◩◪◨⬒⬓⬔
7. XorNot+KB[view] [source] 2023-01-14 13:40:21
>>ifdefd+0l
If you distribute a random byte stream, and someone uses that as a one time pad to encrypt a movie, then are you distributing the movie?

The answer is of course not, and the same principle applies if someone uses Stable Diffusion to find a latent space encoding for a copyright image (the 231 byte number - had to go double check what the grid size actually is).

◧◩◪◨⬒⬓⬔⧯
8. ifdefd+761[view] [source] 2023-01-14 17:45:36
>>XorNot+KB
I think it boils down to one question: can you prompt the model to show mostly unchanged pictures from artists? Then it's definitely problematic. If not, then I don't have enough knowledge of the topic to give a strong opinion. (my previous answer was just an use case that fits your argument)
◧◩◪◨⬒⬓⬔⧯▣
9. XorNot+qE1[view] [source] 2023-01-14 21:19:40
>>ifdefd+761
I mean no, it doesn't. It's like drawing something in Photoshop which is a copyright'd work: the act of creating it is the violation, it doesn't prove that Photoshop contains the content directly.

The way SD model weights work, if you managed to prompt engineer a recreation of one specific work, it would only have been generated as a product of all the information in the entire training set + noise seed + the prompt. And the prompt wouldn't look anything like a reasonable description of any specific work.

Which is to say, it means nothing because you can equally generate a likeness of works which are known not to be included in the training set (easy, you ask for a latent encoding of the image and it gives you one): equivalent to a JPEG codec.

[go to top]