zlacker

[parent] [thread] 2 comments
1. Aerroo+(OP)[view] [source] 2023-01-14 19:12:35
Models for Stable Diffusion are about 2-8GB in size. 5 billion images means that every image gets about 1 byte.

It seems to me that they're claiming here that Stability has somehow manage to store copies of these images in about 1 byte of space each. That's an incredible compression ratio!

replies(1): >>SillyU+kL
2. SillyU+kL[view] [source] 2023-01-15 01:55:57
>>Aerroo+(OP)
It is a form compression that loses some much of the uniqueness which gives it the high ratio. If the concept is a little hard to grasp consider an AI model like a finite state machine, but it stores affinity and weights of the data's relationship to each other too.

In GPT this is words and phrases, e.g. "Frodo Baggins" high affinity, "Frodo Superman" will be negligible. Now consider all words that may link to those words - potentially billions of words (or phrases), but (probably/hopefully) none replicated. The phrases are out of specific context because they cover _all contexts_ in the training data. When you speak to GPT it randomises these words in response to you, typically choosing the words/phrases with the highest affinity, to the words you prompted, this almost gives it the appearance of emergent AI, because it is crossing different concepts (texts) in it's answers.

Stable Diffusion works similarly but with colours (words), and patterns/styles (phrases). Now if you ask for a green field in the style of Van Gogh, it could compare Van Gogh's work to a backdrop from Windows XP. You could argue depending on the degree of those things it gives you you are violating copyrights, however that narrow view doesn't take into account that although you've specifically asked for Van Gogh and that's where it concentrates, it's also pulling in work from potentially hundreds of other lower affinity sources. It's this dilution which means you'll never see an untainted original source image.

So in essence, it's the user who is breaching the copyright by specifying concentration on specific terms in the prompt, not the model. The model is simply a set of patterns, and the user is making those patterns breach copyright which IMHO is no different to the user copying a painting with a brush.

The brush isn't the thing you sue.

replies(1): >>Aerroo+Mg1
◧◩
3. Aerroo+Mg1[view] [source] [discussion] 2023-01-15 09:04:16
>>SillyU+kL
I think so too. I also think that this is a dangerous issue, because we don't really know how our brains work. If we set legal restrictions over this and then it turns out our brains work in a similar manner, then what?
[go to top]