zlacker

I believe Copilot was giving exact copies of large parts open source projects, without the license. Are image generators giving exact (or very similar) copies of existing works?

I feel like this is the main distinction.

replies(3): >>limite+D >>rivers+n1 >>visarg+k4

>>TheMid+(OP)
These models produce a lot of “in the style of” content, which is different from an exact copy. Is that different enough? I guess that’s what this lawsuit is going to be about.

replies(2): >>8n4vid+O2 >>TheMid+m3

>>TheMid+(OP)
> Are image generators giving exact (or very similar) copies of existing works?

um, yes.[1][2] What else would they be trained on?

According to the model card:

[1] https://github.com/CompVis/stable-diffusion/blob/main/Stable...

it was trained on this data set(which has hyperlinks to images, so feel free to peruse):

[2] https://huggingface.co/datasets/laion/laion2B-en

replies(1): >>chii+g2

>>rivers+n1
> What else would they be trained on?

why does it matter how it was trained? The question is, does the generative AI _output_ copyrighted images?

Training is not a right that the copyright holder owns exclusively. Reproducing the works _is_, but if the AI only reproduces a style, but not a copy, then it isn't breaking any copyright.

replies(2): >>hutzli+b3 >>rivers+5W1

>>limite+D
I've seen some overtrained models. they keep showing the same face over and over again. surely from the training data. i don't think you can argue against stable diffusion as a whole, but maybe specific models that haven't muddled the data enough to become something unique

replies(1): >>visarg+T5

>>chii+g2
Yes, because real artists are also allowed to learn from other paintings. No problem there, unless they recreate the exact work of others.

replies(1): >>visarg+Y4

>>limite+D
Yeah what's considered a copy or not is a grey area. Here's a good example of that: https://news.ycombinator.com/item?id=34378300

But artists have been making "in the style of" works for probably millennia. Fan art is a common example.

I suppose the advent of software that makes it easy to make "in the style of" works will force us to get much more clear on what is and isn't a copy. How exciting.

However, I don't see how the software tool is directly at fault, just the person using it.

>>TheMid+(OP)
Not large parts of open source projects. It was one function that was pretty well known and replicated. The author prompted with a part of the code, and the model finished the rest including the original comments.

There are two issues here

- the model needs to be carefully prompted (goaded) into copyright violation, so it is instigated to do it by excessive quoting from the original

- the replicated codes are usually boilerplate, common approaches or "famous" examples from books; in other words they are examples that appear in multiple places in the training set as opposed to just once

Do generic codes, boilerplate and API calls deserve protection? Maybe the famous examples do, but not every replicated code does.

replies(1): >>jeroen+iX

>>hutzli+b3
Banning AI from training on copyrighted works is also problematic because copyright doesn't protect ideas, it only protects expression. So the model has legitimate right to learn ideas (minus expression) from any source.

For example facts in the phonebook are not copyrighted, the authors have to mix fake data to be able claim copyright infringement. Maybe the models could finally learn how many fingers to draw on a hand.

>>8n4vid+O2
It's a small industry to fine-tune a model on your photos to generate fantasy images of yourself / to see yourself in a different way.

>>visarg+k4
Copilot didn't just spit out the fast inverse square root, it spat out someone's entire "about" page in HTML, name and all. This was just some guy's blog, not a commonly replicated algorithm from a book.

Furthermore, copyright infringement doesn't stop being copyright infringement if you do it based on someone else's copyright infringement. Just become someone else decided to rip the contents of a CD and upload it to a website doesn't mean I'm now allowed to download it from that website again.

Copyright law does include an originality floor, you can't copyright a letter or a shape unless you're a billion dollar startup and in the same way that you can't copyright fizzbuzz or hello world. I don't think that's relevant for many algorithms Copilot will generate for you, though.

If simple work doesn't deserve protection, the pop music industry with their generic lyrics and simple tunes may be in big trouble. Disney as well, with their simplistic cartoon characters like Donald Duck and Mickey Mouse.

Personally, I think copyright laws are extremely damaging in their duration and restrictions. IP law in a small amount of countries actually allows for patenting algorithms, which is equally silly. International IP law currently gets in the way of society in my opinion.

However, without short term copyright neither programmers nor artists will be happy and I don't think anyone but knock-off companies will be happy with such an arrangement. Five or ten years is long enough for copyright in my book, but within those five or ten years copyright must remain protected.

>>chii+g2
Agree 100%. I misread the post as "given" rather than "giving" and was answering what I perceived the question to be–are models given copyright images–oops.