zlacker

> Stable Diffusion relies on a mathematical process called diffusion to store compressed copies of these training images, which in turn are recombined to derive other images. It is, in short, a 21st-century collage tool.

Just no, that's not how any of that works.

I guess that lie is convenient to legitimate the lawsuit.

replies(3): >>fruit2+z >>idle_z+B >>lelant+82

>>realus+(OP)
How does it work then? :)

replies(3): >>realus+J >>ben_w+b2 >>Partia+E6

>>realus+(OP)
It's a pretty funny assertion. The whole point of ML models is to take training data and learn something general from it, the common threads, such that it can identify/generate more things like the training examples. If the model were, as they assert, just compressing and reproducing/collaging training images then that would just indicate that the engineers of the model failed to prevent overfitting. So basically they're calling StabilityAI's engineers bad at their job.

replies(1): >>realus+s6

>>fruit2+z
The opposite way, the training images are there to support the model to generalize features.

Reproducing parts of existing images in the dataset is called overfitting and is considered a failure of the model.

replies(1): >>8n4vid+55

>>realus+(OP)
That's a lie, sure, but if they had instead claimed:

The output of stable diffusion isn't possible without first examining millions of copyrighted images

Then the suit looks a little more solid, because (as you pointed out) it isn't possible for the stable diffusion owner to know which of those copyright images had clauses that prevents stable diffusion trading and similar usage.

The whole problem goes away once artists and photographers starting using a license that explicitly removes any use of the work as training data for any automated training.

replies(1): >>iamacy+K6

>>fruit2+z
Computerphile has friendly introductions to just about everything: https://youtu.be/1CIpzeNxIhU

>>realus+J
how do you measure success?

i wrote an OCR program in college. we split the data set in half. you train it on one half then test it against the other half.

you can train stable diffusion on half the images, but then what? you use the image descriptions of the other half and measure how similar they are? in essence, attempting to reproduce exact replicas. but i guess even then it wouldn't be copyright if those images weren't used in the model. more like me describing something vividly to you and asking you to paint it and then getting angry at you because its too accurate

replies(2): >>Partia+07 >>Lerc+w9

>>idle_z+B
As a side discussion, is there any research model which tries to do what they describe? Like overfitting to the maximum possible to create a way to compress data. It might be useful in different ways.

replies(1): >>visarg+H7

>>fruit2+z
Diffusion models learn a transformation operator. The parameters are adjusted such that the operator maximises the evidence lower bound, or in other words, increasing the likelihood of observing a slightly less noisy version of the input.

The guidance component is a vector representation of the text that changes where we are in the sample space. A change in the sample space changes likelihood so for the different prompts the likelihood of the same output image for the same input image will be different.

Since the model is trained to maximise the ELBO, it will produce a change closer to the prompt.

A good way to think about it is this: given a classifier, I can select a target class and compute the derivative of the input with respect to the target class, and apply the derivative to the input. This puts it closer to my target class.

From the perspective of some models (score models), they produce a derivative of the density (of the samples), so it’s a bit similar to computing a derivative via classifier.

The above was concerned with what the NN was doing.

The algorithm applies the operator a number of steps, and progressively improves the image. In some probabilistic models, you can think of this as an inverse of stochastic gradient descent procedure (meaning a series of steps) that, with some stochasticity, reach a high value region (the density).

However, it turns out that learning this operation doesn’t have to be grounded in probability theory and graphical models.

As long as the NN learns a sufficiently good recovery operator, diffusion will construct something based on the properties of the dataset that has been used.

At no point however are there condensed representations of images since the NN is not learning to produce an image from zero in one step. It merely learns to recover some operation applied to the input.

For the probabilistic view, read Denoising Diffusion Probabilistic Networks and references, in particular langevin dynamics. It includes citations to score models as well.

For the non probabilistic component, read Cold diffusion.

For using the classifier gradient to update an image towards another class, read about adversarial generation via input gradients.

replies(1): >>visarg+48

>>lelant+82
> The whole problem goes away once artists and photographers starting using a license that explicitly removes any use of the work as training data for any automated training.

A license which should be opt-in, not opt-out.

Of course, it’s opt-out because they know, fundamentally, that most artists would not want to opt-in.

replies(1): >>lelant+98

>>8n4vid+55
FID score is a measure of success.

Instead of aiming to reproduce exact replicas, you use a classifier and retrieve the input of the last layer. Do it for both generated and original inputs, and then measure the differences in the statistics.

Wikipedia has a good article on this.

>>realus+s6
Yes, look at NeRF (neural radiance fields) and SIREN (Implicit Neural Representations with Periodic Activation Functions)

replies(1): >>realus+3D

>>Partia+E6
> A good way to think about it is this: given a classifier, I can select a target class and compute the derivative of the input with respect to the target class, and apply the derivative to the input. This puts it closer to my target class.

excellent description, thanks

>>iamacy+K6
> A license which should be opt-in, not opt-out.

I dunno if it matters that the opt-in has to be at the legislation level.

After all, once Creative Commons adds that clause to their most popular license, it's game over for training things like Stable Diffusion.

I'm thinking that maybe the most popular software licenses can be extended with a single clause like "usage as training data not allowed".

Of course, we cannot retroactively apply these licenses so the current model will still be able to generate images/code; they just won't be able to easily use any new ones without getting into trouble.

>>8n4vid+55
You would not need have of the images to perform that test. No more than a handful of images to prove that the text representation will not produce a identical image to a given image that has had a description described.

They don't even produce the same image twice from the same description and a different random seed.

>>visarg+H7
The papers I'm finding on those look truly amazing! Thanks a lot for the insights