zlacker

[return to "We’ve filed a law­suit chal­leng­ing Sta­ble Dif­fu­sion"]
1. dr_dsh+12[view] [source] 2023-01-14 07:17:25
>>zacwes+(OP)
“Sta­ble Dif­fu­sion con­tains unau­tho­rized copies of mil­lions—and pos­si­bly bil­lions—of copy­righted images.”

That’s going to be hard to argue. Where are the copies?

“Hav­ing copied the five bil­lion images—with­out the con­sent of the orig­i­nal artists—Sta­ble Dif­fu­sion relies on a math­e­mat­i­cal process called dif­fu­sion to store com­pressed copies of these train­ing images, which in turn are recom­bined to derive other images. It is, in short, a 21st-cen­tury col­lage tool.“

“Diffu­sion is a way for an AI pro­gram to fig­ure out how to recon­struct a copy of the train­ing data through denois­ing. Because this is so, in copy­right terms it’s no dif­fer­ent from an MP3 or JPEG—a way of stor­ing a com­pressed copy of cer­tain dig­i­tal data.”

The examples of training diffusion (eg, reconstructing a picture out of noise) will be core to their argument in court. Certainly during training the goal is to reconstruct original images out of noise. But, do they exist in SD as copies? Idk

◧◩
2. Aerroo+ll1[view] [source] 2023-01-14 19:12:35
>>dr_dsh+12
Models for Stable Diffusion are about 2-8GB in size. 5 billion images means that every image gets about 1 byte.

It seems to me that they're claiming here that Stability has somehow manage to store copies of these images in about 1 byte of space each. That's an incredible compression ratio!

◧◩◪
3. SillyU+F62[view] [source] 2023-01-15 01:55:57
>>Aerroo+ll1
It is a form compression that loses some much of the uniqueness which gives it the high ratio. If the concept is a little hard to grasp consider an AI model like a finite state machine, but it stores affinity and weights of the data's relationship to each other too.

In GPT this is words and phrases, e.g. "Frodo Baggins" high affinity, "Frodo Superman" will be negligible. Now consider all words that may link to those words - potentially billions of words (or phrases), but (probably/hopefully) none replicated. The phrases are out of specific context because they cover _all contexts_ in the training data. When you speak to GPT it randomises these words in response to you, typically choosing the words/phrases with the highest affinity, to the words you prompted, this almost gives it the appearance of emergent AI, because it is crossing different concepts (texts) in it's answers.

Stable Diffusion works similarly but with colours (words), and patterns/styles (phrases). Now if you ask for a green field in the style of Van Gogh, it could compare Van Gogh's work to a backdrop from Windows XP. You could argue depending on the degree of those things it gives you you are violating copyrights, however that narrow view doesn't take into account that although you've specifically asked for Van Gogh and that's where it concentrates, it's also pulling in work from potentially hundreds of other lower affinity sources. It's this dilution which means you'll never see an untainted original source image.

So in essence, it's the user who is breaching the copyright by specifying concentration on specific terms in the prompt, not the model. The model is simply a set of patterns, and the user is making those patterns breach copyright which IMHO is no different to the user copying a painting with a brush.

The brush isn't the thing you sue.

[go to top]