zlacker

Sometimes I have to wonder about the hypocrisy you can see on HN threads. When its software development, many here seem to understand the merits of a similar lawsuit against Copilot[1], but as soon as its a different group such as artists, then it's "no, that's not how a NN works" or "the NN model works just the same way as a human would understand art and style."

[1] https://news.ycombinator.com/item?id=34274326

replies(8): >>hardwa+A >>dymk+11 >>TheMid+k1 >>synu+C1 >>ipsum2+v2 >>NhanH+d3 >>lolind+64 >>turtle+h21

>>supriy+(OP)
Indeed

Instead of improving the world, creating better tools they want to sue eachother. I thought those times were over

Or maybe it is just bias against Microsoft

>>supriy+(OP)
How do you know it's the same people making those comments?

>>supriy+(OP)
I believe Copilot was giving exact copies of large parts open source projects, without the license. Are image generators giving exact (or very similar) copies of existing works?

I feel like this is the main distinction.

replies(3): >>limite+X1 >>rivers+H2 >>visarg+E5

>>supriy+(OP)
I suspect it's different people. There's a kind of bias where it seems like everyone else on a forum is all one person who behaves super inconsistently, I've thought it as well.

>>TheMid+k1
These models produce a lot of “in the style of” content, which is different from an exact copy. Is that different enough? I guess that’s what this lawsuit is going to be about.

replies(2): >>8n4vid+84 >>TheMid+G4

>>supriy+(OP)
It's only hypocrisy if you think that HN is made up of a single person commenting under multiple accounts and not a diverse group of people with varying opinions.

>>TheMid+k1
> Are image generators giving exact (or very similar) copies of existing works?

um, yes.[1][2] What else would they be trained on?

According to the model card:

[1] https://github.com/CompVis/stable-diffusion/blob/main/Stable...

it was trained on this data set(which has hyperlinks to images, so feel free to peruse):

[2] https://huggingface.co/datasets/laion/laion2B-en

replies(1): >>chii+A3

>>supriy+(OP)
At the very least, Stable Diffusion is much different than Copilot in term of the model license. I, you, and all the artists have irrevocable access to the model (in practical term, I'm not interested in discussion whether they can somehow legal strong arms people from using the model).

We only have mere limited access to Copilot. And it is impractical for almost anyone else on earth to train a similar model, while we are 100% sure it is possible to have a dataset or to redo the training of SD. Just from pure utilitarian point of view, it's much easier to support fighting against Copilot than SD

replies(1): >>chii+29

>>rivers+H2
> What else would they be trained on?

why does it matter how it was trained? The question is, does the generative AI _output_ copyrighted images?

Training is not a right that the copyright holder owns exclusively. Reproducing the works _is_, but if the AI only reproduces a style, but not a copy, then it isn't breaking any copyright.

replies(2): >>hutzli+v4 >>rivers+pX1

>>supriy+(OP)
It's not hypocrisy, it's diversity.

HN is not a person, it's a forum with lots of people with different opinions. Depending on dozens of factors (time of day, title of the article, who gets in first) different opinions will dominate.

I've seen threads on Copilot that overwhelmingly come down in favor of Microsoft and threads on Stable Diffusion that come down hard against it. Also, even in a thread that has a lot of one opinion, there are always those who express the opposite view.

replies(1): >>visarg+b5

>>limite+X1
I've seen some overtrained models. they keep showing the same face over and over again. surely from the training data. i don't think you can argue against stable diffusion as a whole, but maybe specific models that haven't muddled the data enough to become something unique

replies(1): >>visarg+d7

>>chii+A3
Yes, because real artists are also allowed to learn from other paintings. No problem there, unless they recreate the exact work of others.

replies(1): >>visarg+i6

>>limite+X1
Yeah what's considered a copy or not is a grey area. Here's a good example of that: https://news.ycombinator.com/item?id=34378300

But artists have been making "in the style of" works for probably millennia. Fan art is a common example.

I suppose the advent of software that makes it easy to make "in the style of" works will force us to get much more clear on what is and isn't a copy. How exciting.

However, I don't see how the software tool is directly at fault, just the person using it.

>>lolind+64
Funny thing - the forum works like a language model. It doesn't have one set personality, but it can generate from a distribution of people. The language model can generate from a distribution of prompts, which might be persona descriptions.

> Out of One, Many: Using Language Models to Simulate Human Samples

https://arxiv.org/abs/2209.06899

>>TheMid+k1
Not large parts of open source projects. It was one function that was pretty well known and replicated. The author prompted with a part of the code, and the model finished the rest including the original comments.

There are two issues here

- the model needs to be carefully prompted (goaded) into copyright violation, so it is instigated to do it by excessive quoting from the original

- the replicated codes are usually boilerplate, common approaches or "famous" examples from books; in other words they are examples that appear in multiple places in the training set as opposed to just once

Do generic codes, boilerplate and API calls deserve protection? Maybe the famous examples do, but not every replicated code does.

replies(1): >>jeroen+CY

>>hutzli+v4
Banning AI from training on copyrighted works is also problematic because copyright doesn't protect ideas, it only protects expression. So the model has legitimate right to learn ideas (minus expression) from any source.

For example facts in the phonebook are not copyrighted, the authors have to mix fake data to be able claim copyright infringement. Maybe the models could finally learn how many fingers to draw on a hand.

>>8n4vid+84
It's a small industry to fine-tune a model on your photos to generate fantasy images of yourself / to see yourself in a different way.

>>NhanH+d3
disregarding the access part, i say copilot also does not violate copyright, in so far as it only reproduces insubstantial portions of existing works.

If you asked copilot to reproduce an existing work, then surely that violates copyright - in the same way you can ask SD to reproduce one of the training data (which would violate copyright in the same way).

But both the training, and the usage of these ML models do not violate copyright. Only until someone produces a copyrighted works from it, does that particular _usage_ instance will violate copyright, and it does not invalidate any other usages.

>>visarg+E5
Copilot didn't just spit out the fast inverse square root, it spat out someone's entire "about" page in HTML, name and all. This was just some guy's blog, not a commonly replicated algorithm from a book.

Furthermore, copyright infringement doesn't stop being copyright infringement if you do it based on someone else's copyright infringement. Just become someone else decided to rip the contents of a CD and upload it to a website doesn't mean I'm now allowed to download it from that website again.

Copyright law does include an originality floor, you can't copyright a letter or a shape unless you're a billion dollar startup and in the same way that you can't copyright fizzbuzz or hello world. I don't think that's relevant for many algorithms Copilot will generate for you, though.

If simple work doesn't deserve protection, the pop music industry with their generic lyrics and simple tunes may be in big trouble. Disney as well, with their simplistic cartoon characters like Donald Duck and Mickey Mouse.

Personally, I think copyright laws are extremely damaging in their duration and restrictions. IP law in a small amount of countries actually allows for patenting algorithms, which is equally silly. International IP law currently gets in the way of society in my opinion.

However, without short term copyright neither programmers nor artists will be happy and I don't think anyone but knock-off companies will be happy with such an arrangement. Five or ten years is long enough for copyright in my book, but within those five or ten years copyright must remain protected.

>>supriy+(OP)
Aren't they different?

Stable Diffusion is about closed to open.

Copilot is about open to closed.

The Stable Diffusion version of Copilot would be something like

"Give me a cart checkout algorithm in the style of Carmack, secure C style."

And that's fine, if the destination code license were just as open--or even less restricting--than the source codes' license (relicensing rules permitting).

What could be the issue is the generated source becomes even more closed or proprietary, which defeats the original source intent.

Is that right?

>>chii+A3
Agree 100%. I misread the post as "given" rather than "giving" and was answering what I perceived the question to be–are models given copyright images–oops.