I feel like this is the main distinction.
um, yes.[1][2] What else would they be trained on?
According to the model card:
[1] https://github.com/CompVis/stable-diffusion/blob/main/Stable...
it was trained on this data set(which has hyperlinks to images, so feel free to peruse):
why does it matter how it was trained? The question is, does the generative AI _output_ copyrighted images?
Training is not a right that the copyright holder owns exclusively. Reproducing the works _is_, but if the AI only reproduces a style, but not a copy, then it isn't breaking any copyright.
But artists have been making "in the style of" works for probably millennia. Fan art is a common example.
I suppose the advent of software that makes it easy to make "in the style of" works will force us to get much more clear on what is and isn't a copy. How exciting.
However, I don't see how the software tool is directly at fault, just the person using it.
There are two issues here
- the model needs to be carefully prompted (goaded) into copyright violation, so it is instigated to do it by excessive quoting from the original
- the replicated codes are usually boilerplate, common approaches or "famous" examples from books; in other words they are examples that appear in multiple places in the training set as opposed to just once
Do generic codes, boilerplate and API calls deserve protection? Maybe the famous examples do, but not every replicated code does.
For example facts in the phonebook are not copyrighted, the authors have to mix fake data to be able claim copyright infringement. Maybe the models could finally learn how many fingers to draw on a hand.
Furthermore, copyright infringement doesn't stop being copyright infringement if you do it based on someone else's copyright infringement. Just become someone else decided to rip the contents of a CD and upload it to a website doesn't mean I'm now allowed to download it from that website again.
Copyright law does include an originality floor, you can't copyright a letter or a shape unless you're a billion dollar startup and in the same way that you can't copyright fizzbuzz or hello world. I don't think that's relevant for many algorithms Copilot will generate for you, though.
If simple work doesn't deserve protection, the pop music industry with their generic lyrics and simple tunes may be in big trouble. Disney as well, with their simplistic cartoon characters like Donald Duck and Mickey Mouse.
Personally, I think copyright laws are extremely damaging in their duration and restrictions. IP law in a small amount of countries actually allows for patenting algorithms, which is equally silly. International IP law currently gets in the way of society in my opinion.
However, without short term copyright neither programmers nor artists will be happy and I don't think anyone but knock-off companies will be happy with such an arrangement. Five or ten years is long enough for copyright in my book, but within those five or ten years copyright must remain protected.