In fairness, Diffusion is arguably a very complex entropy coding similar to Arithmetic/Huffman coding.
Given that copyright is protectable even on compressed/encrypted files, it seems fair that the “container of compressed bytes” (in this case the Diffusion model) does “contain” the original images no differently than a compressed folder of images contains the original images.
A lawyer/researcher would likely win this case if they re-create 90%ish of a single input image from the diffusion model with text input.
I understand people’s livelihoods are potentially at stake, but what a shame it would be if we find AGI, even consciousness but have to shut it down because of a copyright dispute.
Oh, one image is enough to apply copyright as if it were a patent, to ban a process that makes original works most of the time?
The article authors say it works as a "collage tool" trying to minimise the composition and layout of the image as unimportant elements. At the same time forgetting that SD is changing textures as well, so it's a collage minus textures and composition?
Is there anything left to complain about? unless, by draw of luck, both layout and textures are very similar to a training image. But ensuring no close duplications are allowed should suffice.
Copyright should apply one by one, not in bulk. Each work they complain about should be judged on its own merits.
I think it is likely github will do the same with copilot.
The fact that the derivation involves millions of works as opposed to a single one is immaterial for the copyright issue.
so, digits of pi anyone?
There's a world of difference that you are just writing off.
No it doesn't, it means that abstract facts related to this image might be stored.
The data must be encoded with various levels of feature abstraction for this stuff to work at all. Much like humans learning art, if devoid of the input that makes human art interesting (life experience).
I think a more promising avenue for litigating AI plagiarism is to identify that the model understands some narrow slice of the solution space that contains copyrighted works, but is much weaker when you try to deviate from it. Then you could argue that the model has probably used that distinct work rather than learned a style or a category.
I didn’t say it cuz I didn’t think it would resonate, but it’s a whole new world we are quickly entering.
Which of course then arrives at the problem: the original data plainly isn't stored in a byte exact form, and you can only recover it by providing an astounding specific input string (the 512 bit latent space vector). But that's not data which is contained within Stable Diffusion. It's equivalent to trying to sue a compression codec because a specific archive contains a copyrighted image.
And it would be illegal for me to sell or distribute zipped copies of images without the copyright holder’s consent. Similarly there might be an argument for why Diffusion[1] specifically can’t be built with copyrighted images.
[1] which is just one part of something like Stable Diffusion
You can draw Biden yourself if you're talented and it's not considered a derivative of anything.
This is the most salient point in this whole HN thread!
You can’t sue Stable Diffusion or the creators of it! That just seems silly.
But (I don’t know I’m not a lawyer) there might be an argument to sue an instance of Stable Diffusion and the creators of it.
I haven’t picked a side of this debate yet, but it has already become a fun debate to watch.
See https://openai.com/blog/dall-e-2-pre-training-mitigations/ "Preventing Image Regurgitation".
But back to your point “if you were to take the first sentence from a thousand books and use it in your own book”, then yes based on my understanding (I am not a lawyer) of copyright you would be in violation of IP laws.
Nothing points to that, in fact even in this website they had to lie on how stablediffusion actually works, maybe a sign that their argument isn't really solid enough.
> [1] https://arxiv.org/pdf/2212.03860.pdf
You realize those are considered defects of the model right? Sure, this model isn't perfect and will be improved.
That's the opposite goal of this image model. Sure you might find other types of research models which are meant to do that but that's not stablediffusion and the likes.
That said it can sometimes be in violation of copyright if it creates a specific image that is “too close to another original” (just like a human would be in violation even if they never previously saw that image).
But the above is just my intuition (and possibly yours) that doesn’t mean a lawyer couldn’t make the argument that it’s a ”good enough lossy compression - just like jpeg but smaller” and therefore “contains the images in just 2 bytes”.
That lawyer may fail to win the argument, but there is a chance that they do win the argument! Especially as researchers keep making Diffusion and SD models better and better at being compression algos (which is a topic people are actively working on).
You can call copying of input as a defect, but why are you simultaneously arguing that it doesn't occur?
It's both undesirable and not relevant to this kind of lawsuit.
The law can do whatever its writers want. The law is mutable, so the answer to your question is “maybe”.
Maybe SD will get outlawed for copyright reasons on a single image. The law and the courts have done sillier things.
Since SD is trained by gradient updating against several different images at the same time, it of course never copies any image bits straight into it. Since it's a latent-diffusion model, actual "image"ness is limited to the image encoder (VAE), so any fractional bits would be in there if you want to look.
The text encoder (LAION OpenCLIP) does have bits from elsewhere copied straight into it to build the tokens list.
https://huggingface.co/stabilityai/stable-diffusion-2-1/raw/...
It's bad news for art websites themselves if that's the case...
You can’t sue Canon for helping a user take better infringing copies of a painting, nor can you sue Apple or Nikon or Sony or Samsung… you can sue the user making an infringing image, not the tools they used to make the infringing image… the tools have no mens rea.
People are treating this like its a binary technical decision. Either it is or isn't a violation. Reality is that things are spectrums and judges judge. SD will likely be treated like a remix that sampled copywritten work, but just a tiny bit of each work, and sufficiently transformed it to create a new work.
That's plainly untrue, as Stable Diffusion is not just the algorithm, but the trained model—trained on millions of copyrighted images.
Specifically fair use #3 "the amount and substantiality of the portion used in relation to the copyrighted work as a whole."
A sentence being a copyright violation would make every book review in the world illegal.
Since md5 hashes don't share this property, they're not "in that vein".
The software itself is not at issue here. If they had trained the network on public domain images then there’d be no lawsuit. The legal question to settle is whether it’s allowable to train (and use) a model on copyrighted images without permission from the artists.
They may actually be successful at arguing that the outputs are either copies or derived works which would require paying the original artist for licenses.
That’s not how it works. Your collage would be fine if it was the only one since you used magazines you bought. Where you’d get into trouble is if you started printing copies of your collage and distributing them. In that case you’d be producing derived works and be on the hook for paying for licenses from the original authors.
If a person creates a perfect copy of something it shows they have put thousands of hours of practice into training their skills and maybe dozens or even hundreds of hours into the replica.
When a computer generates a replica of something it's what it was designed to do. AI art is trying to replicate the human process, but it will always have the stink of "the computer could do this perfectly but we are telling it not to right now"
Take Chess as an example. We have Chess engines that can beat even the best human Chess players very consistently.
But we also have Chess engines designed to play against beginners, or at all levels of Chess play really.
We still have Human-only tournaments. Why? Why not allow a Chess Engine set to perform like a Grandmaster to compete in tournaments?
Because there would always be the suspicion that if it wins, it's because it cheated to play at above it's level when it needed to. Because that's always an option for a computer, to behave like a computer does.
Compression that returns something different from the original most of the time, but still could return the original.
Except with computers, they don't need to eat or sleep, converse or attend stand-ups.
And once you're able to draw that one picture, you could probably draw similar ones. Your own style may emerge too.
Just thinking. Copywriters, students, and scribes used to copy stuff verbatim, sometimes just to "learn" it.
The product of that study could be published works, a synthesis of ideas from elsewhere, and so on. We would say it belonged to the executor, though.
So the AI learned, and what it has created belongs to it. Maybe.
Or, once we acknowledge AI can "see" images, precedent opens the way to citizenship (humanship?)
[1] https://en.wikipedia.org/wiki/Barack_Obama_%22Hope%22_poster
Simply appearing on a shared hosting site should not be enough.
It does have Mona lisa because of over fitting. But that's because there is too much Mona lisa on internet.
These artist taking part in suit won't be able to recreat any of their work.
Stable Diffusion is not made to decompress the original and actually has no direct mechanism for decompressing any originals. The originals are not present. The only thing present is an embedding of key components of the original in a multi-dimensional latent space that also includes text.
This doesn't mean that the outputs of Stable Diffusion cannot be in violation of a copyright, it just means that the operator is going to have to direct the model towards a part of that text/image latent space that violates copyright in some manner... and that the operator of the model, when given an output that is in violation of copyright, is liable for publishing the image. Remember, it is not a violation of copyright to photocopy an image in your house... it's a violation when you publish that image!
SD might know how to violate copyright but is that enough to sue it? Or can you only sue violations it helps create?
If that software happens to output an image that is in violation of copyright then it is not the fault of the model. Also, if you ran this software in your home and did nothing with the image, then there's no violation of copyright either. It only becomes an issue when you choose to publish the image.
The key part of copyright is when someone publishes an image as their own. That they copy an image doesn't matter at all. It's what they DO with the image that matters!
The courts will most likely make a similar distinction between the model, the outputs of the model, and when an individual publishes the outputs of the model. This would be that the copyright violation occurs when an individual publishes an image.
Now, if tools like Stable Diffusion are constantly putting users at risk of unknowingly violating copyrights then this tool becomes less appealing. In this case it would make commercial sense to help users know when they are in violation of copyright. It would also make sense to update our copyright catalogues to facilitate these kinds of fingerprints.
There are no models I know of with the ability to generate an exact copy of an image from its training set unless it was solely trained on that image to the point it could. In that case I could argue the model’s purpose was to copy that image rather than learn concepts from a broad variety of images to the point it would be almost impossible to generate an exact copy.
I think a lot of the arguments revolving around AI image generators could benefit from the constituent parties reading up on how transformers work. It would at least make the criticisms more pointed and relevant, unlike the criticisms drawn in the linked article.
What I object to is not the AI itself, or even that my code has been used to train it. It's the copyright for me but not for thee way that it's been deployed. Does GitHub/Microsoft's assertion that training sidesteps licensing apply to GitHub/Microsoft's own code? Do they want to allow (a hypothetical) FSFPilot to be trained on their proprietary source? Have they actually trained Copilot on their own source? If not, why not?
I published my source subject to a license, and the force of that license is provided by my copyright. I'm happy to find other ways of doing things, but it has to be equitable. I'm not simply ceding my authorship to the latest commercial content grab.
What do you mean by this in the context of generating images via prompt? “Fractional bits” don’t make sense and it’s more misleading if anything. Regardless, a model violating criteria for being within fair use will always be judged by the outputs it generates rather than its composing bytes (which can be independent)
Special agents from the MPAA sent to assassins an Android who can spew out high quality art.
tl;dr I think there's a distinction between training on copyrighted but public content and private content.
First, there is a legal definition of a "derivative work" and there is an artistic notion of a "derivative work". If the two of us both draw a picture of the Statue of Liberty, artistically we have both derived the drawing based on the original statue. However, neither of these drawings in relation to the original sculpture nor the other drawing is legally considered a derivative work.
Let's think about a cartoonish caricature of Joe Biden. What "makes up" Joe Biden?
https://www.youtube.com/watch?v=QRu0lUxxVF4
To what extent are these "constituent parts" present in every image of Joe Biden? All of them? Is the latent space not something that is instead hidden in all images of Joe Biden? Can an image of Joe Biden be made by anyone that is not derived from these "high order" characteristics of what is recognizable as Joe Biden across a number of different renderings from disparate individuals?
People have posted illegal Windows source code leaks to GitHub. Microsoft doesn’t seem to care that much because these repos stay up for months or even years at a time without Microsoft DMCAing them-if you go looking you’ll find some right now. I think it is entirely possible, even likely, that some of those repos were included in Copilot’s training data set. So Copilot actually was trained on (some of) Microsoft’s proprietary source code, and Microsoft doesn’t seem to care.
Is it "the model cannot possibly recreate an image from its training set perfectly" or is it "the model is extremely unlikely to recreate an image from its training set perfectly, but it could in theory"?
Because I am willing to bet it's the latter.
> You’re acting like the “computer” has a will of it’s own. Generating a perfect copy of an image would be a completely separate task from training a model for image generation.
Not my intent, of course I don't think computers have a will of their own. What I meant, obviously, is that it's always possible for a bad actor of a human to make the computer behave in a way that is detrimental to other humans and then justify it by saying "the computer did it, all I did is train the model".
are we looking at the output of the same program? because all of the output images i look at have eyes looking in different direction and things of horror in place of hands or ears, and they feature glasses meting into people faces, and that's the good ones, the bad one have multiple arms contorting out of odd places while bent at unnatural angles.
If licenses don't apply to training, then they don't apply for anyone, anywhere. If they do apply, then Copilot is violating my license.
Save a photo on your computer, open it in a browser or photo viewer, you will get that photo. That is the default behavior of computers. That is not in dispute, is it?
All of this machine learning stuff is trying to get them to not do that. To actually create something new that no one actually stored on them.
Hope that clears up the misunderstanding.
I am not a lawyer but I also assume Microsoft's position, at least in part, is that they can download and use code in GitHub public repos just like anyone else can and developing a public service based on training with that (and a lot of other) code isn't redistributing that code.
Kind of like recreating your image one object at a time. It might not be exact, but close enough.
All of my other points remain unchanged by this pedantry.
SD both creates derivative works and also sometimes creates pixel level copies from portions of the trained data.
Best you can do is to mask and keep inpainting the area that looks different until it doesn't.
You cannot copyright “any image that resembles Joe Biden”.
That’s said, it does raise the question, “should this precedent be extended to humans?”
i.e. Can humans be taught something based on copyrighted materials in the training set/curriculum?
LAION-5b is also just an indexer (in terms of images).
To address (b) first: Fair Use has long held that educational purposes are a valid reason for using copyrighted materials without express permission—for instance, showing a whole class a VHS or DVD, which would technically require a separate release otherwise.
For (a): I don't know anything about your background in ML, so pardon if this is all obvious, but at least current neural nets and other ML programs are not "AI" in anything like the kind of sense where "teaching" is an apt word to describe the process of creating the model. Certainly the reasoning behind the Fair Use exception for educating humans does not apply—there is no mind there to better; no person to improve the life, understanding, or skills of.
But the fact that it often generates new content, that didn’t exist before, or at least doesn’t breach the limits of fair use, goes against the argument made in the lawsuit.
So humans can already run afoul of copyright this way, the bar for NNs might end up lower.
- Open Microsoft Paint
- Make a blank 400 x 400 image
- Select a pixel and input an R,G,B value
- Repeat the last two steps
To reproduce a copyrighted work. I'm sure people have done this with e.g. pixel art images of copyrighted IP of Mario or Link. At 400x400, it would take 160,000 pixels to do this. At 1 second per pixel, a human being could do this in about a week.
Because people have the capability of doing this, and in fact we have proof that people have done so using tools such as MS paint, AND because it is unlikely but possible that someone could reproduce protected IP using such a method, should we ban Microsoft Paint, or the paint tool, or the ability to input raw RGB inputs?
In this sense, stable diffusion is more analogous to the JPEG algorithm than it is to a specific collection of JPEG files. As it stands, the originals trainng data is not stored, even in a compressed way.
This is an intelligence augmentation tool. It’s effectively like I’m really good at reading billions of lines of code and incorporating the learnings into my own code. If you don’t want people learning from your code, don’t publish it.
At some point the input must be considered part of the work. At the limit you could just describe every pixel, but that certainly wouldn’t mean the model contained the work.
While I doubt that specific case has been tested in court, arguably you could. If you created glitch art (https://en.wikipedia.org/wiki/Glitch_art) via compression artifacts, and your work was sufficiently distinct from the original work, I think you would have a reasonable case for transformative use (https://en.wikipedia.org/wiki/Transformative_use).
For example, if a publish a music remix tool with a massive database of existing music, creators might use to create collages that are original and fall under fair use. But the tool itself is not and requires permission from the rights owners.
Copyright, and laws in general, exists to protect the human members of society not some abstract representation of them.
Not the way it's used in Stable Diffusion models. Compressed data can be decompressed knowing only the decompression algorithm. To recover data from a stable diffusion model, you need to know the algorithm and the prompt.
A critical part of the information _isn't_ in the data you decompress, it has to come from you. (And this isn't that relevant, but it would be lossy, perceptual compression like jpeg or mp3, not lossless compression like Huffman or Arithmetic coding.)
Me having bought the magazines also has nothing to do with anything. Would apply equally if they were gifted or free or stolen.
As a thought experiment, imagine a variant of something like SD was used for music generation rather than images. It was trained on all music on spotify and it is marketed as a paid tool for producers and artists. If the model reproduces specific sounds from certain songs, e.g. the specific beat from a song, hook, or melody, it would seem pretty straightforward that the generated content was derivative, even though only a feature of it was precisely reproduced. I could be wrong but as far as i am aware you need to get permission to use samples. Even if the content is not published those sounds are being sold by the company as inspiration, and therefore that should violate copyright. The training data is paramount because if you trained the model on stuff you generated yourself or on stuff with appropriate CC license, the resulting work would not violate copyright, or you could at least argue independent creation.
In the feature space of images and art, SD is doing something very similar, so i can see the argument that it violates copyright even without reproducing the whole training data.
Overall, i think we will ultimately need to decide how we want these technologies used, what restrictions should be on the training data, etc, and then create new laws specifically for the new technology, rather than trying to shoehorn it into existing copyright law.
A trained model holds relationships between patterns/colours in artwork and their affinity to the other images in the model (ignoring the English tagging of images data within this model for a minute). To this degree, it holds relationships between millions of images and the degree of similarities (i.e. affinity weighting of the patterns within them) in a big blob (the model).
When you ask for a dragon by $ARTIST it will find within it's model an area of data with high affinity to a dragon and that of $ARTIST. What has been glossed over in discussion here is that there are millions of other bits of related images - that have lower affinity - from lots of unrelated artwork which gives the generated image uniqueness. Because of this, you can never recreate 1:1 the original image, it's always diluted by the relationships from the huge mass of other training data, e.g. a colour from a dinosaur exhibit in a museum may also be incorporated as it looks like a dragon, along with many other minor traits from millions of other images, chosen at random (and other seed values).
Another interesting point is that a picture of a smiling dark haired woman would have high affinity with Mona Lisa, but when you prompt for Mona Lisa you may get parts of that back and not the patterns from the Mona Lisa*, even though it looks the same. That arguably (not getting Mona Lisa) is no longer the copyrighted data.
* Nb. this is a contrived example, since in SD the real Mona Lisa weightings will out number the individual dark haired woman's many times, however this concept might be (more) appropriate for minor artists whose work is not popular enough to form a significantly large amount of weighting in the training data.
It's like the compression that occurs when I say "Mona Lisa" and you read it, and can know many aspects of that painting.
Legislation is driven by people who are, on aggregate, not autistic. So it's entirely appropriate to presume that a person not understanding how that process works is indeed autistic, especially if they suggest machines are subjects of law by analogy with human beings.
It's not that autists are bad people, they are just outliers in the political spectrum, as you can see from the complete disconnect of up-voted AI-related comments on Hacker News, where autistic engineers are clearly over-represented, versus just about any venue where other professionals, such as painters or musicians, congregate. Just try to suggest to them that a corporation has the right to use their work for free and profit from it while leaving them unemployed, because the algorithm the corporation uses to exploit them is in some abstract sense similar to how their brain works. That position is so for out on the spectrum that presuming a personality peculiarity of the emitter is the absolutely most charitable interpretation.
So while it would be possible to create a "Public Diffusion" that took the Stable Diffusion refinements of the ML techniques and created a model built solely out of public-domain art, as it stands, "Stable Diffusion" includes by definition the model that is built from the copyrighted works in question.