zlacker

> If thouse mouse images are generated, that implies that Disney content is already part of the training data and models.

It doesn't mean that. You could "find" Mickey in the latent space of any model using textual inversion and an hour of GPU time. He's just a few shapes.

(Main example: the most popular artist StableDiffusion 1 users like to imitate is not in the StableDiffusion training images. His name just happens to work in prompts by coincidence.)

replies(2): >>Taywee+I4 >>mcv+fj

>>astran+(OP)
If you can find a copyrighted work in that model that wasn't put there with permission, then why would that model and its output not violate the copyright?

replies(2): >>astran+W4 >>mcv+Uk

>>Taywee+I4
https://en.wikipedia.org/wiki/The_Library_of_Babel

A latent space that contains every image contains every copyrighted image. But the concept of sRGB is not copyrighted by Disney just yet.

replies(1): >>Taywee+l7

>>astran+W4
Sure, but this isn't philosophy. An AI model that contains every image is a copyright derivative of all those images and so is the output generated from it. It's not an abstract concept or a human brain. It's a pile of real binary data generated from real input.

replies(1): >>astran+3a

>>Taywee+l7
StableDiffusion is 4GB which is approximately two bytes per training image. That's not very derivative, it's actual generalization.

"Mickey" does work as a prompt, but if they took that word out of the text encoder he'd still be there in the latent space, and it's not hard to find a way to construct him out of a few circles and a pair of red shorts.

>>astran+(OP)
How do you get that coincidence? To be able to accurately respond to the cue of an artist's name, it has to know the artist, doesn't it?

In any case, in the example images here, the AI clearly knew who Mickey is and used that to generate Mickey Mouse images. Mickey has got to be in the training data.

replies(1): >>esrauc+S51

>>Taywee+I4
The idea behind that is probably that any artist learns from seeing other artists' copyrighted art, even if they're not allowed to reproduce it. This is easily seen from the fact that art goes through fashions; artists copy styles and ideas from each other and expand on that.

Of course that probably means that those copyrighted images exist in some encoded form in the data or neural network of the AI, and also in our brain. Is that legal? With humans it's unavoidable, but that doesn't have to mean that it's also legal for AI. But even if those copyrighted images exist in some form in our brains, we know not to reproduce them and pass them off as original. The AI does that. Maybe it needs a feedback mechanism to ensure its generated images don't look too much like copyrighted images from its data set. Maybe art-AI necessarily also has to become a bit of a legal-AI.

>>mcv+fj
For other artist cases the corpus can include many images that includes a description with phrases like "inspired by Banksy". Then the model can learn to generate images in the style of Banksy without having any copyrighted images by Banksy in the training set.

The Mickey Mouse case though is obviously bs, the training data definitely does just have tons of infringing examples of Mickey Mouse, it didn't somehow reinvent the exact image of him from first principles.