zlacker

I’ve noticed that people tend to disapprove of AI trained on their profession’s data, but are usually indifferent or positive about other applications of AI.

For example, I know artists who are vehemently against DALL-E, Stable Diffusion, etc. and regard it as stealing, but they view Copilot and GPT-3 as merely useful tools. I also know software devs who are extremely excited about AI art and GPT-3 but are outraged by Copilot.

For myself, I am skeptical of intellectual property in the first place. I say go for it.

replies(33): >>ChildO+K >>machin+c1 >>tpxl+j1 >>dawner+52 >>ghowar+E2 >>wzdd+74 >>teddyh+a4 >>pclmul+y4 >>bayind+a7 >>heavys+F9 >>tables+G9 >>bcrosb+I9 >>yjftsj+ba >>joecot+he >>lucide+oe >>matheu+sg >>9wzYQb+th >>maxbon+Di >>teawre+Uj >>jrm4+wk >>sattos+3o >>Taylor+tr >>zahrc+us >>dopido+Mu >>Krishn+Gw >>cercat+Kw >>cypres+sz >>sander+HB >>sineno+tC >>sicp-e+yF >>stevew+1G >>orbita+BL >>sireat+ID1

>>kweing+(OP)
I think sadly it's just people being protective, the technology is interesting so if it doesn't hit their line of work, it's fantastic, if it does, then it's terrible.

There is no arguing against it though, you can't stop it, all this stuff is coming eventually to all of these areas, might as well try and find ways to use the oppurutinies while you can while some of this is still new.

replies(1): >>naillo+m2

>>kweing+(OP)
I'm pretty sure DALL-E was trained only on not copyright material ( they say so :| ).

But to be honest if your code is open source im pretty sure Microsoft don't care about licence they'll just use it cause "reasons" same about stable diffusion they don't give a fuk about data if its in internet they'll use it so its topic that probably will be regulated in few years.

Until then lets hope they'll get milked (both Microsoft and NovelAI) for illegal content usage and I srsly hope at least few layers will try milking it asap especially NovelAI which illegally usage a lot of copyrighted art in the training data.

replies(1): >>msbarn+d2

>>kweing+(OP)
When Joe Rando plays a song from 1640 on a violin he gets a copyright claim on Youtube. When Jane Rando uses devtools to check a website source code she gets sued.

When Microsoft steals all code on their platform and sells it, they get lauded. When "Open" AI steals thousands of copyrighted images and sells them, they get lauded.

I am skeptical of imaginary property myself, but fuck this one set of rules for the poor, another set of rules for the masses.

replies(15): >>gw99+r3 >>a4isms+R3 >>insani+N4 >>stickf+b8 >>rtkwe+o8 >>e40+D8 >>c7b+F8 >>cyanyd+D9 >>lo_zam+2a >>foobar+zg >>foobar+Vg >>Aeolun+ki >>znpy+Kj >>versio+9k >>aejnsn+Ss

>>kweing+(OP)
In theory AI should never return an exact copy of a copyrighted work or even anything close enough you could argue is the original “just changed”. If the styles are the same I think that’s fine, no different than someone else cloning it. But there’s definitely outputs from stable diffusion that looks like the original with some weird artifacts.

We need regulation around it.

replies(3): >>rtkwe+G8 >>XorNot+kc >>orbita+tN

>>machin+c1
> I'm pretty sure DALL-E was trained only on not copyright material

Nope. DALL-E generates images with the Getty Watermark, so clearly there’s copyrighted materials in its training set: https://www.reddit.com/r/dalle2/comments/xdjinf/its_pretty_o...

replies(3): >>machin+23 >>pclmul+A5 >>nottor+87

>>ChildO+K
I mean we definitely can stop it. Laws are a pretty strong deterrent.

replies(4): >>ghaff+93 >>faerie+D3 >>tpm+d4 >>BeFlat+aA

>>kweing+(OP)
I am a programmer who has written extensively on my blog and HN against Copilot.

I am also not a hypocrite; I do not like DALL-E or Stable Diffusion either.

As a sibling comment implies, these AI tools give more power to people who control data, i.e., big companies or wealthy people, while at the same time, they take power away from individuals.

Copilot is bad for society. DALL-E and Stable Diffusion are bad for society.

I don't know what the answer is, but I sure wish I had the resources to sue these powerful entities.

replies(8): >>willia+A3 >>c7b+j5 >>akudha+I6 >>cmdial+x9 >>vghfgk+ea >>epolan+Lh >>csalle+Pm >>BeFlat+Vz

>>msbarn+d2
Thanks for posting this out never see that before. If they use copyright images they should also get punished in the original paper they say no copyright content was used but it can be just lies who know data speak for itself and if they can prove this in court they should get punished ( so again Microsoft getting rekt for that will be good to see :] ).

>>naillo+m2
"We" maybe can't stop it. But if there were the political will to kneecap many uses of machine learning, it's not obvious there's any reason it couldn't be done even if not 100% effective. Whether that would be a good thing is a different question.

>>tpxl+j1
If this is the new status quo then I suggest we find out how to fuck up the corpus as best as possible.

>>ghowar+E2
I’m a programmer and a songwriter and I am not worried about these tools and I don’t think they are bad for society.

What did the photograph do to the portrait artist? What did the recording do to the live musician?

Here’s some highfalutin art theory on the matter, from almost a hundred years ago: https://en.wikipedia.org/wiki/The_Work_of_Art_in_the_Age_of_...

replies(4): >>Samoye+v4 >>ghowar+05 >>snarfy+Ef >>__alex+fg

>>naillo+m2
You can slow this, you can't stop it whatsoever. It's about as ultimately futile as an effort as trying to stop piracy. People are ALREADY running salesforce codegen and stable diffusion at home, you can't put the genie back in the bottle, what we'll have 20 years from now is going to make critics of these tools have nightmares.

If you try to outlaw it, the day before the laws come into effect, I'm going to download the very best models out there and run it on my home computer. I'll start organising with other scofflaws and building our own AI projects in the fashion of leelachesszero with donated compute time.

You can shut down the commercial versions of these tools. You can scare large corporations from banning the use of these tools by corporations. You can pull an uno reverse card and use modified versions of the tools to CHECK for copyright infringement and sue people under existing laws AND you'll probably even be able to statistically prove somebody is an AI user. But STOPPING the use of these tools? Go ahead and try, won't happen.

replies(1): >>tables+1a

>>tpxl+j1
> one set of rules for the poor, another set of rules for the masses.

Conservatism consists of exactly one proposition, to wit:

There must be in-groups whom the law protects but does not bind, alongside out-groups whom the law binds but does not protect.

—Composer Frank Wilhoit[1]

[1]: https://crookedtimber.org/2018/03/21/liberals-against-progre...

replies(1): >>sbuttg+o5

>>kweing+(OP)
This talking point seems to come up often, but since it's basically saying that people are hypocrites I think it is a bad faith thing to say without reasonable proof that it's not a fringe opinion (or completely invented).

For what it's worth, the people I know who are opposed to this sort of "useful tool" don't discriminate by profession.

>>kweing+(OP)
An accusation of hypocrisy is not an argument; at least not a relevant one.

replies(1): >>kweing+yp

>>naillo+m2
What would the law do? Forbid automatic data collection and/or indexing and further use without explicit copyright holder agreement? That would essentially ban the whole internet as we know it, not saying that would be bad, but this is never going to happen, too much accumulated momentum in the opposite direction.

replies(1): >>chiefa+na

>>willia+A3
But this isn’t like photography and portrait artistry. This is more like a wealthy person stealing your entire art catalog, laundering it in some fancy way, and then claiming they are the original creator. Stable Diffusion has literally been used to create new art by screenshotting someone’s live-streamed art creation process as the seed. While creating derivative work has always been considered art(such as deletion poetry and collage), it’s extremely uncommon and blasé to never attribute the original(s).

replies(1): >>insani+s5

>>kweing+(OP)
I think the distinction is that only one of those classes tends to produce exact copies of work. Programmers get very upset at DALL-E and Stable Diffusion producing exact (and near-exact) copies of artwork too. In contrast to exact copying, production of imitations (not exact copies, but "X in the style of Y") is something that artists have been doing for centuries, and is widely thought of as part of arts education.

For some reason, code seems to lend itself to exact copying by AIs (and also some humans) rather than comprehension and imitation.

replies(2): >>XorNot+5c >>okosla+Ln

>>tpxl+j1
> Joe Rando plays a song from 1640 on a violin he gets a copyright claim on Youtube

That can't possibly be a valid claim, right? AFAIK copyright is "gone" after the original author dies + ~70 years. Before fairly recently it was even shorter. Something from 1640 surely can't be claimed under copyright protection. There are much more recent changes where that might not be the case, but 1640?

> When Jane Rando uses devtools to check a website source code she gets sued.

Again, that doesn't sound like a valid suit. Surely she would win? In the few cases I've heard of where suits like this are brought against someone they've easily won them.

replies(9): >>Rodeoc+l5 >>cipher+q6 >>alxlaz+x6 >>lbotos+T6 >>Rimint+q7 >>pessim+c8 >>kevin_+dd >>kmeist+4A >>kube-s+aL

>>willia+A3
Do you know what's different about the photograph or the recording?

They are still their own separate works!

If a painter paints a person for commission, and then that person also commissions a photographer to take a picture of them, is the photographer infringing on the copyright of the painter? Absolutely not; the works are separate.

If a recording artist records a public domain song that another artist performs live, is the recording artist infringing on the live artist? Heavens, no; the works are separate.

On the other hand, these "AI's" are taking existing works and reusing them.

Say I write a song, and in that song, I use one stanza from the chorus of one of your songs. Verbatim. Would you have a copyright claim against me for that? Of course, you would!

That's what these AI's do; they copy portions and mix them. Sometimes they are not substantial portions. Sometimes, they are, with verbatim comments (code), identical structure (also code), watermarks (images), composition (also images), lyrics (songs), or motifs (also songs).

In the reverse of your painter and photographer example, we saw US courts hand down judgment against an artist who blatantly copied a photograph. [1]

Anyway, that's the difference between the tools of photography (creates a new thing) and sound recording (creates a new thing) versus AI (mixes existing things).

And yes, sound mixing can easily stray into copyright infringement. So can other copying of various copyrightable things. I'm not saying humans don't infringe; I'm saying that AI does by construction.

[1]: https://www.reuters.com/world/us/us-supreme-court-hears-argu...

replies(1): >>willia+Y6

>>ghowar+E2
> these AI tools give more power to people who control data, i.e., big companies or wealthy people, while at the same time, they take power away from individuals.

Not sure I agree, but I can at least see the point for Copilot and DALL-E - but Stable Diffusion? It's open source, it runs on (some) home-use laptops. How is that taking away power from indies?

Just look at the sheer number of apps building on or extending SD that were published on HN, and that's probably just the tip of the iceberg. Quite a few of them at least looked like side projects by solo devs.

replies(1): >>ghowar+d6

>>insani+N4
This isn't a legal copyright claim, it's a "YouTube" copyright claim which is entirely owned and enforced by YouTube.

replies(1): >>insani+U5

>>a4isms+R3
Thanks for posting the link to the quote. Having said that, I don't think it's possible to quote that bit and get an understanding of the idea being conveyed without it's opening context. Indeed, it's likely to cause a false idea of what's being conveyed. From earlier in the same post:

"There is no such thing as liberalism — or progressivism, etc.

There is only conservatism. No other political philosophy actually exists; by the political analogue of Gresham’s Law, conservatism has driven every other idea out of circulation."

replies(1): >>a4isms+o6

>>Samoye+v4
> This is more like a wealthy person stealing your entire art catalog, laundering it in some fancy way, and then claiming they are the original creator.

If I take a song, cut it up, and sing over it, my release is valid. If I parody your work, that's my work. If you paint a picture of a building and I go to that spot and take a photograph of that building it is my work.

I can derive all sorts of things, things that I own, from things that others have made.

Fair use is a thing: https://www.copyright.gov/fair-use/

As for talking about the originals, would an artist credit every piece of inspiration they have ever encountered over a lifetime? Publishing a seed seems fine as a nice thing to do, but pointing at the billion pictures that went into the drawing seems silly.

replies(2): >>tremon+cg >>Samoye+yC

>>msbarn+d2
Lots of people ironically put the Getty watermark on pictures and memes that they make to satirically imply that they are pulling stock photos off the internet with the printscreen function instead of paying for them.

replies(1): >>msbarn+H8

>>Rodeoc+l5
OK but then we're just talking about content moderation, which seems like a separate issue. I think using "YouTube copyright claim" as a proxy for "legal copyright claim" is more to the parent's point, especially since that's how YouTube purports the claim to work. Otherwise it feels irrelevant.

replies(2): >>cipher+W6 >>lupire+79

>>c7b+j5
SD is better than the other two, but it will still centralize control.

I imagine that Disney would take issue with SD if material that Disney owned the copyright to was used in SD. They would sue. SD would have to be taken off the market.

Thus, Disney has the power to ensure that their copyrighted material remains protected from outside interests, and they can still create unique things that bring in audiences.

Any small-time artist that produces something unique will find their material eaten up by SD in time, and then, because of the sheer number of people using SD, that original material will soon have companions that are like it because they are based on it in some form. Then, the original won't be as unique.

Anyone using SD will not, by definition, be creating anything unique.

And when it comes to art, music, photography, and movies, uniqueness is the best selling point; once something is not unique, it becomes worth less because something like it could be gotten somewhere else.

SD still has the power to devalue original work; it just gives normal people that power on top of giving it to the big companies, while the original works of big companies remain safe because of their armies of lawyers.

replies(2): >>c7b+9a >>cortes+dt

>>sbuttg+o5
I agree that adds considerable depth to the value of the quote, and connects it to the conversation he appeared to be having, which is about the first line you've quoted:

There is no such thing as being a Liberal or Progressive, there is only being a Conservative or anti-Conservative, and while there is much nuänce and policy to debate about that, it boils down to deciding whether you actually support or abhor the idea of "the law" (which is a much broader concept than just the legal system) existing to enforce or erase the distinction between in-groups and out-groups.

But that's just my read on it. Getting back to intellectual property, it has become a bitter joke on artists and creatives, who are held up as the beneficiaries of intellectual property laws in theory, but in practice are just as much of an out-group as everyone else.

We are bound by the law—see patent trolls, for example—but not protected by it unless we have pockets deep enough to sue Disney for not paying us.

>>insani+N4
The poster isn't claiming that this is a valid DMCA suit. Nearly everyone who is at a mildly decent level and has posted their own recordings of classical musical to YouTube have received these claims _in their Copyright section_. YouTube itself prefixes this with some lengthy disclaimer about how this isn't the DMCA process but that they reserve the right to kick you off their site based on fraudulent matches made by their algorithms.

They are absolutely completely and utterly bullshit. Nobody with half an ear for music will mistake my playing of Bach's G Minor Sonata with Arthur Grumiaux (too many out of tune notes :-D). But yet, YouTube still manages to match this to my playing, probably because they have never heard it before now (I recorded it mere minutes before).

So no, it isn't a valid claim, but this algorithm trained on certain examples of work, manages to make bad classifications with potentially devastating ramifications for the creator (I'm not a monetized YouTube artist, but if this triggered a complete lockout of my Google account(s), this likely end Very Badly).

I think it's a very relevant comparison to the GP's examples.

replies(1): >>ndiddy+kt

>>insani+N4
> That can't possibly be a valid claim, right?

It's not, but good luck talking to a human at Youtube when the video gets taken down.

> Again, that doesn't sound like a valid suit. Surely she would win?

Assuming she could afford the lawyer, and that she lives through the stress and occasional mistreatment by the authority, yes, probably. Both are big ifs, though.

replies(1): >>makeit+Uh

>>ghowar+E2
but I sure wish I had the resources to sue these powerful entities.

I wonder if there is a crowdfunding platform like gofundme, for lawsuits. Or can gofundme itself can be used for this purpose? It would be fantastic to sue the mega polluters, lying media like Fox etc.

That said, even with a lot of money, are these cases winnable? Especially given the current state of Supreme Court and other federal courts?

>>insani+N4
> That can't possibly be a valid claim, right?

I'm not a lawyer, but my understanding is that while the "1640's violin composition" itself may be out of copyright, if I record myself playing it, my recording of that piece is my copyright. So if you took my file (somehow) and used it without my permission, and I could prove it, I could claim copyright infringement.

That's my understanding, and I've personally operated that way to avoid any issues since it errs on the side of safety. (Want to use old music, make sure the license of the recording explicitly says public domain or has license info)

replies(3): >>lupire+d9 >>vghfgk+3b >>insani+3c

>>insani+U5
Copyright claims are a form of content moderation, by preventing reuse of content that others own.

But it can still be weaponized to prevent legitimate resubmissions of parallel works, that can potentially deplatform legitimate users, depending on the reviewer and the clarity of the rebuttal.

>>ghowar+05
I'm not sure sure that originality is that different between a human and a neural network. That is to say that what a human artist is doing has always involved a lot of mixing of existing creations. Art needs to have a certain level of familiarity in order to be understood by an audience. I didn't invent 4/4 time or a I-IV-V progression and I certainly wasn't the first person to tackle the rhyme schemes or subject matter of my songs. I wouldn't be surprised if there were fragments from other songs in my lyrics or melodies, either from something I heard a long time ago or perhaps just out of coincidence. There's only so much you can do with a folk song to begin with!

BTW, what happened after the photograph is that there were less portrait artists. And after the recording there were less live musicians. There are certainly no less artists nor musicians, though!

replies(2): >>ghowar+N8 >>amanuo+qT

>>msbarn+d2
Dunno about Getty, but I've been shown the cover for Beatles' Yellow Submarine done in different colors as some great AI advancement.

>>kweing+(OP)
I, with my software developer hat, am not excited by AI. Not a bit, honestly. Esp. about these big models trained on huge amount of data, without any consent.

Let me be perfectly clear. I'm all for the tech. The capabilities are nice. The thing I'm strongly against is training these models on any data without any consent.

GPT-3 is OK, training it with public stuff regardless of its license is not.

Copilot is OK, training on with GPL/LGPL licensed code without consent is not.

DALL-E/MidJourney/Stable Diffusion is OK. Training it with non public domain or CC0 images is not.

"We're doing something amazing, hence we need no permission" is ugly to put it very lightly.

I've left GitHub because of CoPilot. Will leave any photo hosting platform if they hint any similar thing with my photography, period.

replies(2): >>psychp+B9 >>Aeolun+tq

>>insani+N4
> That can't possibly be a valid claim, right?

It has indeed happened.

https://boingboing.net/2018/09/05/mozart-bach-sorta-mach.htm...

Sony later withdrew their copyright claim.

There are two pieces to copyright when it comes to public domain:

* The work (song) itself -- can't copyright that

* The recording -- you are the copyright owner. No one, without your permission, can re-post your recording

And of course, there is derivative work. You own any portion that is derivative of the original work.

replies(1): >>insani+Yb

>>tpxl+j1
Yeah, inequality sucks. So how about we focus on making the world better for everyone instead of making the world equally shitty for everyone?

replies(2): >>imwill+4f >>zopa+Nf

>>insani+N4
> Again, that doesn't sound like a valid suit. Surely she would win? In the few cases I've heard of where suits like this are brought against someone they've easily won them.

That's freedom of speech for everyone who can afford a lawyer to bring suit against a music rights-management company.

replies(1): >>insani+6c

>>tpxl+j1
I think copilot is a clearer copyright violation than any of the stable diffusion projects though because code has a much narrower band of expression than images. It's really easy to look at the output of CoPilot and match it back to the original source and say these are the same. With stable diffusion it's much closer to someone remixing and aping the images than it is reproducing originals.

I haven't been following super closely but I don't know of any claims or examples where input images were recreated to a significant degree by stable diffusion.

replies(9): >>makeit+Og >>mr_toa+sh >>Americ+Ai >>paulgb+Fj >>pavlov+il >>DannyB+Tm >>bigiai+Mr >>kmeist+Ax >>dv_dt+TI

>>tpxl+j1
Preach. So incredibly annoyed when I tried to send a video of my son playing Beethoven to his grandparents and it was taken down due to a copyright violation.

>>tpxl+j1
> When Joe Rando plays a song from 1640 on a violin he gets a copyright claim on Youtube. When Jane Rando uses devtools to check a website source code she gets sued.

Do you have any evidence for those claims, or anything resembling those examples?

Music copyright has long expired for classical music, and big shots are definitely not exempt from where it applies. Just look at how much heat Ed Sheeran, one of the biggest contemporary pop stars, got for "stealing" a phrase that was literally just chanting "Oh-I" a few times (just to be clear, I am familiar with the case and find it infuriating that this petty rent-seeking attempt went to trial at all, even if Sheeran ended up being completely cleared, but to great personal distress as he said afterwards).

And who ever got sued for using dev tools? Is there even a way to find that out?

replies(2): >>codefr+t9 >>banana+Da

>>dawner+52
Code is much easier to do that because the avenues for expression are significantly limited compared to just creating an image. For it to be useful copilot has to produce compiling and reasonably terse and understandable code. The compiler in particular is a big bottle neck to the range of the output.

>>pclmul+A5
Memes generally would not fall under the category of non-copyrighted material; they’re most of the time extremely copyrighted material just being used without permission. And even a wholly original work an artist sarcastically puts a Getty watermark and then licensed under Creative Commons or something would fall into very murky territory – the Getty watermark itself is the intellectual property of Getty. The original image author might plead fair use as satire, but satirical intentions aren’t really a defence available to DALL-E.

So even if we’re assuming these were wholly original works that the author placed under something like a Creative Commons license, the fact that it incorporated an image they had no rights to would at the very least create a fairly tangled copyright situation that any really rigorous evaluation of the copyright status of every image in the training set would tend to argue towards rejecting as not worth the risk of litigation.

But the more likely scenario here is that they did minimal at best filtering of the training set for copyrights.

replies(1): >>pclmul+ae

>>willia+Y6
> I'm not sure sure that originality is that different between a human and a neural network. That is to say that what a human artist is doing has always involved a lot of mixing of existing creations.

I disagree, but this is a debate worth having.

This is why I disagree: humans don't copy just copyrighted material.

I am in the middle of developing and writing a romance short story. Why? Because my writing has a glaring weakness: characters, and romance stands or falls on characters. It's a good exercise to strengthen that weakness.

Anyway, both of the two people in the (eventual) couple developed from my real life, and not from any copyrighted material. For instance, the man will basically be a less autistic and less selfish version of myself. The woman will basically be the kind of person that annoys me the most in real life: bright, bubbly, always touching people, etc.

There is no copyrighted material I am getting these characters from.

In addition, their situation is not typical of such stories, but it does have connections to my life. They will (eventually) end up in a ballroom dance competition. Why that? So the male character hates it. I hate ballroom dance during a three-week ballroom dancing course in 6th grade, the girls made me hate ballroom dancing. I won't say how, but they did.

That's the difference between humans and machines: machines can only copyright and mix other copyrightable material; humans can copy real life. In other words, machines can only copy a representation; humans can copy the real thing.

Oh, and the other difference is emotion. I've heard that people without the emotional center of their brains can take six hours to choose between blue and black pens. There is something about emotions that drives decision-making, and it's decision-making that drives art.

When you consider that brain chemistry, which is a function of genetics and previous choices, is a big part of emotions, then it's obvious that those two things, genetics and previous choices, are also inputs to the creative process. Machines don't have those inputs.

Those are the non-religious reasons why I think humans have more originality than machines, including neural networks.

replies(1): >>willia+Ns

>>insani+U5
YouTube does this moderation in order to avoid legal pressure from copyright holders, as in

https://en.m.wikipedia.org/wiki/Viacom_International_Inc._v.....

>>lbotos+T6
The problem is that YouTube AI thinks your recording is the same as every other recording, because it doesn't understand the difference between composition and recording.

>>c7b+F8
There have been a number of stories about musicians being copyright claims. Here is the first result on Google

https://www.radioclash.com/archives/2021/05/02/youtuber-gets...

For being sued for looking at source here is the first result on Google

https://www.wired.com/story/missouri-threatens-sue-reporter-...

replies(2): >>c7b+Vb >>frob+ze

>>ghowar+E2
Obviously this is a matter of philosophy. I am using Copilot as an assistant, and for that it works out very nicely. It's fancy code completion. I don't know who is trying to use this to write non-trivial code but that's as bad an idea as trying to pass off writing AI "prompts" as a type of engineering.

These things are tools to make more involved things. You're not going to be remembered for all the AI art you prompted into existence, no matter how many "good ones" you manage to generate. No one is going to put you into the Guggenheim for it.

Likewise, programmers aren't going to become more depraved or something by using Copilot. I think that kind of prescriptive purism needs to Go Away Forever, personally.

replies(1): >>Aeolun+6q

>>bayind+a7
I disagree.

Those are effectively cases of cryptomnesia[0]. Part and parcel of learning.

If you don't want broad access your work, don't upload it to a public repository. It's very simple. Good on you for recognising that you don't agree with what GitHub looks at data in public repos, but it's not their problem.

[0] https://en.m.wikipedia.org/wiki/Cryptomnesia

replies(1): >>bayind+7f

>>tpxl+j1
Basically, copyright is for people with copyright lawyers

replies(1): >>kodah+Pc

>>kweing+(OP)
Your post is a good example of the tu quoque fallacy[1].

[1] https://en.wikipedia.org/wiki/Tu_quoque

replies(1): >>kweing+Wp

>>kweing+(OP)
> I’ve noticed that people tend to disapprove of AI trained on their profession’s data, but are usually indifferent or positive about other applications of AI.

In other words: the banal observation that people care far more when their stuff is stolen than when some stranger has their stuff stolen.

>>kweing+(OP)
I look at IP differently.

For copyright, the act of me creating something doesn't deprive you of anything, except the ability to consume or use the thing I created. If I were influenced by something, you can still be influenced by that same thing - I do not exhaust any resources I used.

This is wholely different from physical objects. If I create a knife, I deprive you of the ability to make something else from those natural resources. Natural resources that I didn't create - I merely exploited them.

Because of this, I'm fine with copyright (patents are another story). But I have some issues with physical property.

>>faerie+D3
> You can slow this, you can't stop it whatsoever. It's about as ultimately futile as an effort as trying to stop piracy. ... But STOPPING the use of these tools? Go ahead and try, won't happen.

So? No one needs to stop it totally. The world isn't black and white, pushing it to the fringes is almost certainly a sufficient success.

Outlawing murder hasn't stopped murder, but no one's given up on enforcing those laws because of the futility of perfect success.

> If you try to outlaw it, the day before the laws come into effect, I'm going to download the very best models out there and run it on my home computer. I'll start organising with other scofflaws and building our own AI projects in the fashion of leelachesszero with donated compute time.

That sounds like a cyberpunk fantasy.

replies(2): >>faerie+ub >>throwa+Md

>>tpxl+j1
The poor are the masses, or at least part of the masses.

replies(1): >>foobar+ch

>>ghowar+d6
> I imagine that Disney would take issue with SD if material that Disney owned the copyright to was used in SD. They would sue. SD would have to be taken off the market.

Are you sure?

I'm not familiar with the exact data set they used for SD and whether or not Disney art was included, but my understanding is that their claim to legality comes from arguing that the use of images as training data is 'fair use'.

Anyone can use Disney art for their projects as long as it's fair use, so even if they happened to not include Disney art in SD, it doesn't fully validate your point, because they could have done so if they wanted. As long as training constitutes fair use, which I think it should - it's pretty much the AI equivalent of 'looking at others' works', which is part of a human artist's training as well.

replies(1): >>ghowar+ug

>>kweing+(OP)
I can think of two explanations for that off the top of my head.

The first is that people only recognize the problems with the things that they're familiar with, which you would kind of expect.

The other option is that there's a difference in the thing that people object to. My impression is that artists seem to be reacting to the idea that they could be automated out of a job, where programmers are mostly objecting to blatant copyright violation. (Not universally in either case, but often.) If that is the case, then those are genuinely different arguments made by different people.

>>ghowar+E2
The best proposal I’ve heard to deal with the societal/economic problems this sort of A.I. poses were made by Jaron Lanier: https://youtu.be/rGqiswuJuQI?t=1190 …I can see why his proposals of providing (micro-)compensation to people whose tremendous efforts end up being mined by these algorithms would not be popular with researchers/companies who stand to benefit vastly (…presumably including the investors who own this site?). The lobbying power/political power/ awareness/financial resources of your average (atomised) artist/programmer/musician etc. is pretty much nil in comparison… Forgive the clumsy analogy, but I have a feeling the whole thing might end up something like a haulage company that doesn’t want to pay any taxes to help fix the roads though?

>>tpm+d4
To your point, the law can do a lot of things. The issue here is the clarity and ability to enforce the law.

>>c7b+F8
https://twitter.com/mpoessel/status/1545178842385489923

Among many others. Classical music may have fallen into public domain, but modern performances of it is copyrightable, and some of the big companies use copyright matching systems, including YouTube's, that often flags new performances as copies of recordings.

>>lbotos+T6
…yes, as I understand it there are ‘mechanical’ rights vs. publishing rights… (for example hip hop artists may recreate a sample to avoid paying mechanical royalties, but still end up paying for publishing) https://www.lawinsider.com/dictionary/mechanical-rights

>>tables+1a
Cyberpunk sure, but fantasy? Not at all.

replies(1): >>tables+oJ1

>>codefr+t9
Ok - it is a true shame that the YouTube copyright claim system is so broken as to enable those shady practices, and that politicians still haven't upped their knowledge of the internet beyond a 'series of tubes'.

But surely the answer should be to fix the broken YT system and to educate politicians to abstain from baseless threats, not to make AI researchers pay for it?

>>Rimint+q7
> Sony later withdrew their copyright claim.

Right, that's my point... I can sue anyone for anything, doesn't mean I'll win.

replies(2): >>imwill+Xe >>sumedh+5g

>>lbotos+T6
Yes, that sounds right to me. But that's not relevant to "Joe Whoever played it and got sued".

>>pclmul+y4
I'm mildly suspicious that this example is an implementation of a generic matrix functionality though: you couldn't patent this sort of work, because it's not patentable - it's a mathematics. It's fundamentally a basic operation, that would have to be implemented with a similar structure regardless of how you do it.

replies(2): >>pclmul+Sd >>heavys+3g

>>pessim+c8
Yes, this is a problem with the legal system in general.

>>dawner+52
> there’s definitely outputs from stable diffusion that looks like the original with some weird artifacts.

Do you have examples? Because SD will generate photoreal outputs and then get subtle details (hands, faces) wrong, but unless you have the source image in hand then you've no way of knowing whether it's a "source image" or not.

>>cyanyd+D9
That's not even a joke. One of the premises of a copyright is that you defend your intellectual property or lose it. If the system were more equitable then it would defend your copyright.

replies(3): >>eroppl+5d >>heavys+ff >>patmor+Pk

>>kodah+Pc
This is an inaccurate description of copyright, at least in the United States.

Trademarks require active defense to avoid genericization. Copyright may be asserted at the holder's discretion.

>>insani+N4
The songwriter copyright is expired but there is still a freshly minted copyright on the video and the audio performance.

This becomes particularly onerous when trolls claim copyright on published recordings of environmental sounds that happen to be similar but not identical to someone else's but they do have a legitimate claim on the original recording.

>>tables+1a
You'll never be able to push it to the fringes because there will never be a legal universal agreement even from country to country on where to draw the line.

And as computers get more powerful and the models get more efficient it'll become easier and easier to self host and run them on your own dime. There are already one click installers for generative models such as stable diffusion that run on modest hardware from a few years back.

replies(1): >>tables+cI1

>>XorNot+5c
Patents and copyrights are totally different, and should be treated as such. The issue isn't about whether someone copies the algorithm, it's whether they copy the written code. Nothing in an algorithms textbook is patentable either, but if you copy the words describing an algorithm from it, you are stealing their description.

>>msbarn+H8
You could argue that mocking the Getty logo like that is some form of fair use, which would be a backdoor through which it can end up as a legitimate element of a public domain work, in which case it would be fair game.

I agree with you that it is also possible that people posted Getty thumbnails to some sites as though they are public domain, and that is how the AIs learned the watermark.

replies(1): >>tremon+pj

>>kweing+(OP)
> For myself, I am skeptical of intellectual property in the first place. I say go for it.

If we didn't live in a Capitalist society, that would be fair. But we currently do. That Capitalist society cares little about the well being of artists unless it can find a way to make their art profitable. Projects like DALL-E and Midjourney pillage centuries of human art and sell it back to us for a profit, while taking away work from artists who struggle to make ends meet as it is. Software Developers are generally less concerned about Copilot because they're still making 6 figures a year, but they'll start to get concerned if the technology gets smart enough that society needs less Developers.

An automated future should be a good thing. It should mean that computers can take care of most tasks and humans can have more leisure time to relax and pursue their passions. The reason that artists and developers panic over things like this is that they are watching themselves be automated out of existence, and have seen how society treats people who aren't useful anymore.

>>kweing+(OP)
I don't know specifically what DALL-E was trained on, but if it's art for which the artists' have not consented to it being used to train AI then that's problematic. I haven't seen any objections to DALL-E on that basis specifically though, whereas all the discussion of Copilot is around the fact that code authorship & Github accounts are not intrinsically tied together, making it impossible to have code authors consent to their code being used, regardless of what ToS someone's agreed to.

> For myself, I am skeptical of intellectual property in the first place. I say go for it.

I'm in a similar boat but this is precisely the reason I object so strongly to Copilot. IP has been invented & perpetuated/extended to protect large corporate interests, under the guise of protecting & sustaining innovators & creative individuals. Copilot is a perfect example of large corporate interest ignoring IP when it suits them to exploit individuals.

In other words: the reason I'm skeptical of IP is the same reason I'm skeptical of Copilot.

replies(1): >>__alex+Yf

>>codefr+t9
Just to be clear, because it's in the title, the reporter was threatened with a lawsuit for looking at source code. I cannot find anyone acually sued for it. BTW, here's an article saying said reporter wasn't sued: https://www.theregister.com/AMP/2022/02/15/missouri_html_hac...

Anyone with a mouth can run it and threaten a lawsuit. If fact, I threaten to sue you for misinformation right now unless you correct your post. Fat lot of good my threat will do because no judge in their right mind would entertain said lawsuit because it's baseless.

>>insani+Yb
It worked out justified in this case.

The VAST majority of cases it does not.

>>stickf+b8
This makes no sense.

Absolutely nobody is arguing to make the world shittier

>>psychp+B9
> Those are effectively cases of cryptomnesia.

Disagree, outputting training data as-is is not cryptomnesia. This is not Copilot's first case. It also reproduced ID software's fast inverse square root function as-is, including its comments, but without its license.

> If you don't want broad access your work, don't upload it to a public repository. It's very simple.

This is actually both funny and absurd. This is why we have licenses at this point. If all the licenses is moot, then this opens a very big can of worms...

My terms are simple. If you derive, share the derivation with the same license (xGPL). Copilot is deriving my code. If you use my code as a derivation point, honor the license, mark the derivation with GPL license. This voids your business case? I don't care. These are my terms.

If any public item can be used without any limitations, Getty Images (or any other stock photo business) is illegal. CC licensing shouldn't exist. GPL is moot. Even the most litigious software companies' cases (Oracle, SCO, Microsoft, Adobe, etc.) is moot. Just don't put it on public servers, eh?

Similarly, music and other fine arts are generally publicly accessible. So copyright on any and every production is also invalid as you say, because it's publicly available.

Why not put your case forward with attorneys of Disney, WB, Netflix and others? I'm sure they'll provide all their archives for training your video/image AI. Similarly Microsoft, Adobe, Mathworks, et al. will be thrilled to support your CoPilot competitor with their code, because a) Any similar code will be just cryptomnesia, b) The software produced from that code is publicly accessible anyway.

At this point, I even didn't touch to the fact that humans are trained much more differently than neural networks.

replies(4): >>tremon+7h >>psychp+Bh >>Aeolun+Hq >>stale2+4w

>>kodah+Pc
You're thinking of trademarks.

replies(1): >>kodah+FU3

>>willia+A3
> What did the recording do to the live musician?

The recording destroyed the occupation of being a live musician. People still do it for what amounts to tip money, but it used to be a real job that people could make a living off of. If you had a business and wanted to differentiate it by having music, you had to pay people to play it live. It was the only way.

replies(1): >>willia+bs

>>stickf+b8
Because we’re not the ones with the power. People with limited power pick the fights they might win, not the fights that maximize total welfare for everyone including large copyright holders. There’s no moral obligation to be a philosopher king unless you’re actually on a throne.

>>lucide+oe
Stable Diffusion and DallE were both trained on copyrighted content scraped from the internet with no consent from the publishers.

It's quite a common complaint because some of the most popular prompts involve just appending an artist's name to something to get it to copy their style.

>>XorNot+5c
Mathematics is not patentable, but you can patent the steps a computer takes to compute the results of that particular algorithm.

replies(1): >>pclmul+vm

>>insani+Yb
> I can sue anyone for anything, doesn't mean I'll win.

You cant sue if you dont have money, a big corp can sue even if they know they are wrong.

>>insani+s5
Fair use is an affirmative defense. Others can still sue you for copying, and you will have to hope a judge agrees with your defense. How do you think Google v. Oracle would have turned out if Google's defense was "no your honor, we didn't copy the Java sources. We just used those sources as input to our creative algorithms, and this is what they independently produced"?

If I take a song, cut it up, and sing over it, my release is valid

"valid", how? You still have to pay royalties to the copyright holder of the original song, and you don't get to claim it as your own.

replies(1): >>stale2+Hv

>>willia+A3
> What did the photograph do to the portrait artist?

It completely destroyed the jobs of photo realistic portrait artists. You only have stylised portrait painting now and now that is going to be ripped off too.

replies(1): >>willia+hs

>>kweing+(OP)
> For myself, I am skeptical of intellectual property in the first place. I say go for it.

Me too. I think copyright and these silly restrictions should be abolished.

At the same time, I can't get over the fact these self-serving corporations are all about "all rights reserved" when it benefits them while at the same time undermining other people's rights. Microsoft absolutely knows that what they're doing is wrong. Recently it was pointed out to me that Microsoft employees can't even look at GPL source code, lest they subconsciously reproduce it. Yet they think their software can look at other people's code and reproduce it? What a load of BS.

I'll forgive them for going for it the second copyright is gone. Then it won't be a crime for any of us to copy Windows and Office either. You bet we're gonna go for it too.

replies(1): >>Schroe+Es

>>c7b+9a
> Are you sure?

Yes, I'm sure.

> I'm not familiar with the exact data set they used for SD and whether or not Disney art was included, but my understanding is that their claim to legality comes from arguing that the use of images as training data is 'fair use'.

They could argue that. But since the American court system is currently (almost) de facto "richest wins," their argument will probably not mean much.

The way to tell if something was in the dataset would be to use the name of a famous Disney character and see what it pulls up. If it's there, then once the Disney beast finds out, I'm sure they'll take issue with it.

And by the way, I don't buy all of the arguments for machine learning as fair use. Sure, for the training itself, yes, but once the model is used by others, you now have a distribution problem.

More in my whitepaper against Copilot at [1].

[1]: https://gavinhoward.com/uploads/copilot.pdf

replies(1): >>Stagna+yX

>>tpxl+j1
> one set of rules for the poor, another set of rules for the masses

Presumably by "the masses" you meant "the large corporations"?

Usually, "the masses" means "the common people" ... i.e. not much different from "the poor."

If you meant corporations, I'm 100% behind this comment.

replies(1): >>debugn+2q

>>rtkwe+o8
I think the is exacty the gap the gp is mentionning: to a trained artist it is clear as water that the original image has been lifted wholesale, even if for instance the colors are adjusted here and there.

You put it as a remix, but remixes are credited and expressed as such.

replies(2): >>jzb+Jh >>omnimu+li

>>tpxl+j1
I'm not even a good cellist and YouTube has put copyright claims on the crappy practice videos I have of me playing Saint-Saëns.

replies(1): >>Quantu+6j

>>bayind+7f
Disagree, outputting training data as-is is not cryptomnesia

Outputting training data as-is without attribution is just plain plagiarism. You don't get to put verbatim text from textbooks in your academic papers either.

>>lo_zam+2a
Yeah, presumably this was an editing error and he meant "the corporations."

>>rtkwe+o8
> I haven't been following super closely but I don't know of any claims or examples where input images were recreated to a significant degree by stable diffusion.

I think that the argument being made by some artists is that the training process itself violates copyright just by using the training data.

That’s quite different from arguing that the output violates copyright, which is what the tweet in this case was about.

replies(1): >>rtkwe+En

>>kweing+(OP)
> [people] are usually indifferent or positive about other applications of AI

That sounds like the pro-innovation bias: https://en.m.wikipedia.org/wiki/Pro-innovation_bias

>>bayind+7f
It's funny to say id's fast inverse square root. Conway certainly didn't come up with the algorithm or the magic number.

But your reasoning boils down to I don't like it so it mustn't be that way. That's never been necessarily true.

At any rate piracy is rampant so clearly a large body of people don't think even a direct copies is morally wrong. Let alone something similar.

You're acting as though there are constant won and lost cases over plagiarism. Ed Sheeran seems to defend his work weekly. Every case that goes to court means reasonable minds differ on the interpretation of plagiarism legally.

So what's your point?

Because it seems the main thrust of your argument is I should argue with Microsoft instead (*who own GitHub lol*)? That's all you got to hold back the tide of AI? An appeal to authority?

replies(1): >>bayind+2W

>>makeit+Og
I haven’t seen any side by sides that seem like a lift. Any examples?

I don’t see Midjourney (et al) as remixes, myself. More like “inspired by.”

replies(3): >>omnimu+Ki >>keving+Si >>matkon+Ek

>>ghowar+E2
What makes them bad?

I am against Copilot because Microsoft is training the models with public data disregarding copyright (also, doesn't include it's own code).

replies(2): >>ghowar+ji >>Gigach+RE

>>alxlaz+x6
> Assuming she could afford the lawyer, and that she lives through the stress and occasional mistreatment by the authority,

To add to that, there is provisions to lock her out of pushing new videos to the platform if the number of unresolved copyright claims passes some low number (3?).

So she loses new revenue until her claims prevail, and of course the entity which the claim is made for knows that and has no incentive to help her (don't they even get the monetization from her videos in the meantime ?)

>>epolan+Lh
Because they centralize control, as I said in [1].

Put another way, AI's are tools that give more power to already powerful entities.

[1]: https://news.ycombinator.com/item?id=33227303

>>tpxl+j1
Diffusion of responsibility. When Joe rando plays his song, it’s easy to see the offender and do something about it.

When it’s a faceless mass of 100k employees…? Not so much.

replies(1): >>moron4+zj

>>makeit+Og
Exactly to a programmer copilot is clear violation, to a writer gpt-3 is clear violation, to an artist dalle-2 is clear violation. The artist might love copilot, the writer might love dalle, the programmer might love gpt-3.

Its all the same they just dont realize this.

replies(1): >>sidewn+to

>>rtkwe+o8
I don’t think copilot is intrinsically a copyright violation, as you seem to be alluding to. Examples like this seem to be more controversial, but I’m not sure there’s a clear copyright violation there either.

If you asked every developer on earth to implement FizzBuzz, how many actually different implementations would you get? Probably not very many. Who should own the copyright for each of them? Would the outcome be different for any other product feature? If you asked every dev on earth to write a function that checked a JWT claim, how many of them would be more or less exactly the same? Would that be a copyright violation? I hope the courts answer some of these questions one day.

replies(4): >>stuart+0j >>datafl+sj >>reacha+Us >>didibu+lB

>>kweing+(OP)
> I’ve noticed that people tend to disapprove of AI trained on their profession’s data, but are usually indifferent or positive about other applications of AI.

This is a fascinating observation and I think there's a lot of truth to it. But maybe our inference should be that these systems mistreat each of us, even if it's difficult to see unless it's falling on you.

Maybe a more important question than whether or not this is a violation of intellectual property is whether this is a violation of human dignity, not that it's illegal (though in this case, it may be) but that it's extremely rude in a way that we don't necessarily have the vocabulary for yet.

>>jzb+Jh
Its clear where the knowhow was lifted from it doesnt matter that if the final image is somewhat unique (almost every image is).

replies(1): >>matkon+Fk

>>jzb+Jh
Not safe for work, but one example I saw going around:

https://twitter.com/ebkim00/status/1579485164442648577

Not sure if this was fed the original image as an input or not.

Also seen a couple cases where people explicitly trained a network to imitate an artist's work, like the deceased Kim Jung Gi.

replies(2): >>lbotos+vk >>rtkwe+iE1

>>Americ+Ai
> If you asked every developer on earth to implement FizzBuzz, how many actually different implementations would you get?

Thousands at least. Some of which would actually work.

replies(1): >>Americ+Bj

>>foobar+Vg
I suspect a video of you playing literally anything on the cello--even an improvised song or a random motif--is likely to get reported as a copyright violation when uploaded to YouTube.

replies(1): >>foobar+jG

>>pclmul+ae
Fair use does not make a work public domain; it merely helps the creator of the derivative work defend their case in court. But neither the original nor the derivative becomes public domain after a successful fair use defense.

Not a lawyer, of course, but I think slapping the Getty logo on a work claiming "fair use" and then releasing the work under public domain would be a case of misrepresentation, because Getty still has a copyright claim on your work. Regardless of the copyright status, it's still a clear trademark violation to me.

replies(1): >>pclmul+7L

>>Americ+Ai
> If you asked every developer on earth to implement FizzBuzz, how many actually different implementations would you get?

Does it matter? If you examined every copyright lawsuit on earth over code, how many of them would actually be over FizzBuzz?

replies(1): >>Americ+Gj

>>Aeolun+ki
The "Joe Rando" example is of playing a song that predates the copyright system.

>>stuart+0j
There’s a finite number of ways to implement a working FizzBuzz (or anything else) in any given language, that aren’t substantially similar, is my point. At least without introducing pointless code for the explicit purpose of making it look different.

>>rtkwe+o8
I don’t know of any examples of images being wholly recreated, but it’s certainly possible to use the name of some living artists to get work in their style. In those cases, it seems like not such a leap to say that the AI has obviously seen that artist’s work and that the output is a derivative work. (The obvious counterargument is that this is the same as a human looking at an artist’s work and aping the style.)

replies(3): >>matkon+zk >>Spivak+Ak >>nl+Xz

>>datafl+sj
The same rationale applies to any other simple code block, as I elaborated on.

replies(1): >>datafl+Wj

>>tpxl+j1
Well copyright and intellectual property are made-up concepts anyway.

It’s only logical that people twist and bend the rules.

Anyway, this kind of bs will go on until people start bringing companies to court over this.

I still don’t get why don’t lawyers start offering their services for a cut of the damages… there probably is good money to be made by suing companies that put copyright-infringing ai in production.

>>kweing+(OP)
When it comes to solving a problem, I want it to emit whatever solves the problem.

When it comes to being an AI that understands coding concepts, I don't want it to regurgitate code verbatim.

When it comes to being a product, I don't want it to plagiarize.

>>Americ+Gj
And my point is you don't have lawsuits over one simple code block.

replies(1): >>Americ+uk

>>tpxl+j1
You have four examples of using replicable stuff that has been shared publicly, and you call two of them stealing.

All I can take away from this is the absurdity of intellectual property laws in general. I agree with the GP, if people are sharing stuff, it's fair game. If you don't want people using stuff you made, keep it to yourself. Pretending we can apply controls to how freely available info is used is silly anyway

>>datafl+Wj
This entire thread is about how copilot committed a copyright violation on a simple code block.

replies(1): >>datafl+pl

>>keving+Si
It's really interesting. I suspect the face was inpainted in, or this was a "img2img".

I think over time we are going to see the following:

- If you take say a star wars poster, and inpaint in a trained face over luke's, and sell that to people as a service, you will probably be approached for copyright and trademark infringement.

- If you are doing the above with a satirical take, you might be able to claim fair use.

- If you are using AI as a "collage generator" to smash together a ton of prompts into a "unique" piece, you may be safe from infringement but you are taking a risk as you don't know what % of source material your new work contains. I'd like to imagine if you inpaint in say 20 details with various sub-prompts that you are getting "safer".

replies(1): >>numpad+Jr

>>kweing+(OP)
Again, as a lawyer, I think it's really important to focus on intent. The Constitution gives us "To promote the progress of science and Useful Arts" (which we've expanded.)

So question one in the back of our heads should be "Are we promoting progress here?" That most often means protecting the little guy, and that's why I think it's mostly necessary, and also must be evaluated very skeptically.

replies(1): >>Shamel+rl

>>paulgb+Fj
https://alexanderwales.com/wp-content/uploads/2022/08/image....

Left: “Girl with a Pearl Earring, by Johannes Vermeer” by Stable Diffusion Right: Girl with a Pearl Earring by Johannes Vermeer

This specific one is not copyright violation as it is old enough for copyright to expire. But the same may happen with other images.

from https://alexanderwales.com/the-ai-art-apocalypse/ and https://alexanderwales.com/addendum-to-the-ai-art-apocalypse...

replies(2): >>london+sm >>rtkwe+gn

>>paulgb+Fj
It’s not a copyright violation to commission an artist to make you something in the style of another artist and it’s also not copyright infringement for the artist you hired to look at that artist’s work to learn what that style means. And it’s also not always infringement to draw another artist’s work in your own style same as reimplementing code.

If you “trace” another artists work the hammer comes down though. For Copilot it’s way easier to get it to obviously trace.

replies(1): >>rfrec0+bK

>>jzb+Jh
https://alexanderwales.com/wp-content/uploads/2022/08/image....

Left: “Girl with a Pearl Earring, by Johannes Vermeer” by Stable Diffusion Right: Girl with a Pearl Earring by Johannes Vermeer

This specific one is not copyright violation as it is old enough for copyright to expire. But the same may happen with other images.

from https://alexanderwales.com/the-ai-art-apocalypse/ and https://alexanderwales.com/addendum-to-the-ai-art-apocalypse...

replies(1): >>Fillig+Kt

>>omnimu+Ki
style is not copyrightable under current rules

replies(1): >>omnimu+RV1

>>kodah+Pc
Your point about losing copyright is incorrect. But copyright absolutely was designed with corporations in mind not small individual creators. It was designed around TV,Movies, and print publishers, not around YouTube and Patreon.

>>rtkwe+o8
Stable Diffusion sometimes reproduces the large watermarks used by stock photo providers on their free sample images. That’s embarrassing at the minimum, and potentially a trademark violation.

replies(1): >>bigiai+6s

>>Americ+uk
That code block is neither "simple like FizzBuzz" nor is it in a lawsuit. I feel like we're speaking past each other at this point.

replies(1): >>Americ+mm

>>jrm4+wk
Define progress.

Good luck.

replies(1): >>jrm4+e14

>>datafl+pl
What makes it not simple like FizzBuzz? You will not be able to come up with a reason why this one single function is copyrightable, but a FizzBuzz function isn’t. It’s one function in 15 lines of code. Get 1,000,000 developers to implement that function and you’re not going to have anywhere near 1,000,000 substantially different implementations.

replies(2): >>datafl+4n >>monoca+JO3

>>matkon+zk
I think this happens a lot with famous images since that image will be in the training set hundreds of times.

Even if deduplication efforts are done, that painting will still be in the background of movie shots etc.

>>heavys+3g
Only if it has physical consequences. There was a case in 2014 that narrowed software patents significantly, called "Alice vs CLS Bank." No more patents on computerized shopping carts, but encryption or compression can still be patented.

>>ghowar+E2
> while at the same time, they take power away from individuals.

Stable Diffusion and DALL-E give a ton of power to individuals, hence why they are popular.

It feels like you're doing a cost analysis instead of a cost-benefit analysis, i.e. you're only looking at the negatives. It's a bit like saying cars are bad because they give more power to the big companies who sell them + put horse and buggy operators out of a job.

replies(1): >>ghowar+4q

>>rtkwe+o8
Well no.

Code is only protected to the degree it is creative and not functionally driven anyway.

So the reduced band of possible expression often directly reduces the protectability-through-copyright.

>>Americ+mm
For one thing FizzBuzz is like... 5-6 statements? This function has 13. FizzBuzz has a whopping 1 variable to keep track of. This function has so many I'm not even going to try to count. I'm not going to keep arguing about this, but if you want to believe they're equally simple then you'll just have a hard time convincing other people. That's all I have left to say on this.

replies(2): >>CapsAd+ls >>SAI_Pe+pz

>>matkon+zk
> Left: “Girl with a Pearl Earring, by Johannes Vermeer” by Stable Diffusion Right: Girl with a Pearl Earring by Johannes Vermeer

Even that if done by a person as far as I understand it would not constitute a copyright infringement. It's a separate work mimicking Vermeer's original. The closest real world equivalent I can think of is probably the Obama Hope case by AP vs Shepard Fairy but that settled out of court so we don't really know what the status of that kind of reproduction is legally. On top of that though the SD image isn't just a recoloring with some additions like Fairy's was so it's not quite as close to the original as that case is.

replies(2): >>blende+fJ >>matkon+tz5

>>mr_toa+sh
I'm dubious of that in cases where the training set isn't distributed. If we call the training copyright infringement is downloading an image infringement? is caching?

replies(1): >>didibu+sA

>>pclmul+y4
In my mentoring/tutoring experiences, the comprehension is also resisted when a copy is available.

>>kweing+(OP)
While I personally wouldn’t care about it, I can understand someone taking offense at copilot for spitting out their code verbatim and claiming it isn’t theirs.

Neither GPT nor Dall-e produces content that anyone can point to and say “they are laundering MY work”.

The closest we’ve been to that point is the image generators spitting out copyright watermarks, but they are not clearly attributable to any one single image (afaik).

>>omnimu+li
Does dalle-2 verbatim reproduce artwork? I have never used it.

replies(1): >>CapsAd+fr

>>teddyh+a4
I’m not making an argument.

>>heavys+F9
Well, it would be fallacious reasoning if I was using this as the basis of an argument.

I didn’t intend to argue anything or draw any conclusions. Just making an observation based on conversations with friends and coworkers.

replies(1): >>heavys+hx

>>foobar+zg
I think they mixed "for the rich … for the masses" and "for the poor … for corporations" while writing. But it's clear they meant contrasting terms.

>>csalle+Pm
I explained more in my comment at [1].

The big difference is that cars were a tool that helped regular people by being a force multiplier. Stable Diffusion and DALL-E are not force multipliers in the same way. Sure, you may now produce images that you couldn't before, but there are far fewer profitable uses for images than for cars. Images don't materially affect the world, but cars can.

[1]: https://news.ycombinator.com/item?id=33227303

replies(1): >>csalle+8P

>>cmdial+x9
I think the methods can be unsavory even if the result is nice.

Yes, the way Copilot was trained was morally questionable, but probably legaly fine (Github terms of service).

There is no doubt the result is extremely helpful though.

>>bayind+a7
> training on with GPL/LGPL licensed code without consent is not

That’s actually fine (kind of the idea of specifying a license). What is not fine is using that code in non-GPL licensed code.

replies(1): >>bayind+xW

>>bayind+7f
> If all the licenses is moot, then this opens a very big can of worms

We are talking ‘de facto’ here, not ‘de jure’. It may be legally problematic, but anything made public once is never going back in the box.

>>sidewn+to
It's kind of like having millions of parameters you can tweak to get to an image. So an image does not really exist in the model.

I can imagine Mona Lisa in my head, but it doesn't really "exist" verbatim in my head. It's only an approximation.

I believe copilot works the same way (?)

replies(2): >>heavys+ws >>hacker+xA

>>kweing+(OP)
I am strongly against intellectual property, but I don’t like this idea that any one of us will get in big trouble for openly violating IP restrictions, but if one of these big companies scoops up copyrighted works for their AI it’s fine? The double standard is unfair. This all seems like a great opportunity for big companies to encourage the growth of Creative Commons, which would benefit everyone, but instead they’re making large private datasets only they control.

>>lbotos+vk
Features outside the face is lost/changed from original on the right, so can’t be face inpainting. Unlikely to be style transfers, because some body parts are moved. Most plausibly this was generated.

So much for “generation” - it seems as if these models are just overfitting on extremely small subset of input data that it did not utterly failed to train on, almost that there could be geniuses who would be able to directly generate weight data from said images without all the gradient descent thing.

>>rtkwe+o8
> With stable diffusion it's much closer to someone remixing and aping the images than it is reproducing originals.

So very similar to how the music industry treats sampling then?

Everybody using CoPilot needs to get "code sample clearance" from the original copyright holder before publishing their remix or new program that uses snippets of somebody else's code...

Try explaining _that_ to your boss and legal department.

"To: <all software dev> Effective immediately, any use of Github is forbidden without prior written approval from both the CTO and General Councel."

replies(1): >>kmeist+Sx

>>pavlov+il
Surely at the very least it'd be a TOS violation? I doubt any stock photo service grants you enough rights to redistribute their watermarked free image samples? Especially not in the context of a project like Stable Diffusion?

replies(1): >>Fillig+mt

>>snarfy+Ef
It also gave birth to the recording artist. It certainly didn’t get rid of musicians.

>>__alex+fg
It also gave birth to the photographic portrait artist. It certainly did not get rid of portrait artists in general.

This was of course a leading question. The point was to get you to think about what artists did in response to the photograph. They changed the way they paint.

I'm positive that machine learning will also change the way that people create are and I am positive that it will only add to the rich tapestry of creative possibilities. There are still realistic portrait painters, after all, they're just not as numerous.

>>datafl+4n
It doesn't seem that far off to me. Copyright makes more sense in a larger context, such as making a Windows clone by copy pasting code from some Windows leak.

Without that context, fizzbuzz is not that different from a matrix transpose function to me.

>>kweing+(OP)
Quod licet Iovi, non licet bovi[0]

[0] https://en.m.wikipedia.org/wiki/Quod_licet_Iovi,_non_licet_b...

>>CapsAd+fr
NNs can and do encode information from their training sets in the models themselves, sometimes verbatim.

Sometimes the original information is there in the model, encoded/compressed/however you want to look at it, and can be reproduced.

>>matheu+sg
> Then it won't be a crime for any of us to copy Windows and Office either. You bet we're gonna go for it too.

Don't worry. At that time all of the available hardware will refuse to run any software unless it comes with a signed license from one of the big three.

>>ghowar+N8
Asked to give practical advice to starting writers, he said, “Read.”

https://www.nytimes.com/2022/09/30/books/early-cormac-mccart...

replies(2): >>ghowar+1x >>roboca+4B

>>tpxl+j1
Wowwww. This is exactly what I was thinking, but I couldn’t put it into such a terse simple example. +1

>>Americ+Ai
Copyright is for original whole works. Utility functions don’t fall under that I don’t think.

I suppose whoever wants to pay the fees would “own” these things ?

https://www.copyright.gov/circs/circ61.pdf

>>ghowar+d6
How is this situation made any worse by these AI systems?

If a small time artist has their work stolen, they probably won't be able to fight it very well. They might be able to get a few taken down, but the sheer number will make it impossible to keep up.

Disney, on the other hand, will have armies of lawyers going after any copyright violation.

It seems the same whether AI is involved or not.

replies(1): >>ghowar+Sw

>>cipher+q6
I have dealt with fraudulent Youtube copyright claims, it's a long and annoying process. First, you have to file a dispute, which typically the claimant will automatically reject, and then you have to escalate to a DMCA counternotice, which will take the video offline for a few days to give the claimant a chance to respond. In my experience, the claimant will drop the complaint at this point, but you're theoretically opening yourself up to legal action by sending the counternotice.

>>bigiai+6s
But it's not reproducing their samples. It's just adding their watermark to newly generated pictures you can't find in the training set.

replies(3): >>rovr13+Lw >>dougab+6B >>dragon+rE

>>matkon+Ek
If a human drew that, it would not be copyright violation.

replies(4): >>mattkr+Kz >>Thorre+WB >>makeit+hF >>matkon+Nz5

>>kweing+(OP)
I think you might be into something with your conclusion.

Nonetheless that’s problematic for folks relying on the copyright as of now.

I feel for artists here, devs won’t go hungry or jobless.

>>tremon+cg
Nobody is suing anybody over AI art yet.

Until there are a large amount of court cases, the burden of proof is on you to say that this is copyright infringement.

>>bayind+7f
> I don't care. These are my terms.

That sounds like a you problem, not a us problem.

As of yet, no court has said that any of this is illegal.

So tough luck. Go take it to the supreme court if you disagree, because right now it actually seems like people can do almost whatever they want with these AI tools.

Your objection simply doesn't matter, until there is a court case that supports you. You can't do anything about it, if that doesn't happen.

replies(1): >>bayind+CU

>>kweing+(OP)
> I know artists who are vehemently against DALL-E, Stable Diffusion, etc. and regard it as stealing, but they view Copilot and GPT-3 as merely useful tools.

An example: https://twitter.com/DaveScheidt/status/1578411434043580416

> I also know software devs who are extremely excited about AI art and GPT-3 but are outraged by Copilot.

The fear is not unwarranted though. I can clearly see AI replacing most jobs (not just in tech) but art, crafts, music and even science. There probably will be no field untouched by AI in this decade and completely replaced by next decade.

We have multiple extinction events for humanity lined up: Climate Change, Nuclear Apocalypse and now AI.

We will have to not just work towards reducing harm to the Planet, but also work towards stopping meaningless Wars and figuring out how to deal with unemployment and economic crisis that is looming on the horizon. The only ones to suffer in the end would be the "elites" (or will they be the first depending on how quickly Civilization goes towards Anarchy?).

Can't say for sure. But definitely gloomy days ahead.

>>kweing+(OP)
I often quote this comment regarding AI advances and jobs [0]:

> Yes, many of us will turn into cowards when automation starts to touch our work, but that would not prove this sentiment incorrect - only that we're cowards.

>> Dude. What the hell kind of anti-life philosophy are you subscribing to that calls "being unhappy about people trying to automate an entire field of human behavior" being a "coward". Geez.

>>> Because automation is generally good, but making an exemption for specific cases of automation that personally inconvenience you is rooted is cowardice/selfishness. Similar to NIMBYism.

It's true cowardice to assume that our own profession should be immune from AI while other professions are not. Either dislike all AI, or like it. To be in between is to be a hypocrite.

For me, I definitely am on the side of full AI, even if it automates my job away, simply because I see AI as an advancing force on mankind.

[0] https://news.ycombinator.com/item?id=32461138#32463198

replies(1): >>ironma+xD

>>Fillig+mt
If the watermark is their logo or name, it could copyrighted or trademarked.

replies(1): >>nl+Dz

>>cortes+dt
The sheer scale is what makes it worse.

Because you are right: a few, and a small time artist can fight. Hundreds and thousands of copies, or millions, and even Disney struggles. That's why Disney would go after the model itself; it scales better.

>>willia+Ns
And my advice is to read and live!

One of the reasons Roald Dahl was such a great writer is his life experiences. Read his books Boy and Solo.

>>kweing+Wp
This is a good example of sealioning.

(I kid)

>>rtkwe+o8
The reason why it's easy to match Copilot results back to the original source is that the users are starting with prompts that match their public code, deliberately to cause prompt regurgitation.

Stable Diffusion actually has a similar problem. Certain terms that directly call up a particular famous painting by name - say, the Mona Lisa[0] - will just produce that painting, possibly tiled on top of itself, and it won't bother with any of the other keywords or phrases you throw at it.

The underlying problem is that the AI just outright forgets that it's supposed to create novel works when you give it anything resembling the training set data. If it was just that the AI could spit out training set data when you ask for it, I wouldn't be concerned[1], but this could also happen inadvertently. This would mean that anyone using Copilot to write production code would be risking copyright liability. Through the AI they have access to the entire training set, and the AI has a habit of accidentally producing output that's substantially similar to it. Those are the two prongs of a copyright infringement claim right there.

[0] For the record I was trying to get it to draw a picture of the Mona Lisa slapping Yoshikage Kira across the cheek

[1] Anyone using an AI system to "launder" creative works is still infringing copyright. AI does not carve a shiny new loophole in the GPL.

replies(4): >>xani_+dB >>thetea+4D >>joe-co+dD >>llimll+ZD

>>bigiai+Mr
This is already a problem with anyone who ever copypastes from Stack Overflow. You're all violating CC-BY-SA[0] and nobody really cares about this.

[0] https://stackoverflow.com/help/licensing

replies(1): >>bscphi+JB

>>datafl+4n
SCO v. IBM[1] included claims of sections as small as "…ranging from five to ten to fifteen lines of code in multiple places that are of issue…" in some of the individual claims of the case.

[1] https://en.wikipedia.org/wiki/SCO_Group,_Inc._v._Internation....

replies(1): >>datafl+Rz

>>kweing+(OP)
I'm actually fine with both. I think copyright/IP related to software needs to be toned down a lot. Software patents abolished.

In my opinion the only thing that should be an infringement regarding code is copying entire non trivial files or entire projects outright.

A 100 line snippet should not be copyrighteable. Only the entire work, which you could think as the composition of many of those snippets.

>>rovr13+Lw
And it's the responsibility of the person using the tool to generate that image not to violate copyright by redistributing it.

replies(2): >>behrin+nC >>MereIn+KC

>>Fillig+Kt
I’m not so sure about that.

The scenes à faire doctrine would certainly let you paint your own picture of a pretty girl with a large earring, even a pearl one. That, however, is definitely the same person, in the same pose/composition, in the same outfit. The colors are slightly off, but the difference feels like a technical error rather than an expressive choice.

replies(2): >>Thorre+MB >>boulos+TB

>>SAI_Pe+pz
The "..." part you redacted out explicitly said "it is many different sections of code". It was (quite obviously) not one or two 5-line blocks of code, let alone "simple" ones like FizzBuzz.

replies(1): >>Americ+SM

>>ghowar+E2
Thankfully, Stable Diffusion is on thousands of hard drives, so the genie can't be put back in the bottle.

replies(1): >>Gigach+JE

>>paulgb+Fj
> n those cases, it seems like not such a leap to say that the AI has obviously seen that artist’s work and that the output is a derivative work.

"Copying" a style is not a derivative work:

> Why isn't style protected by copyright? Well for one thing, there's some case law telling us it isn't. In Steinberg v. Columbia Pictures, the court stated that style is merely one ingredient of expression and for there to be infringement, there has to be substantial similarity between the original work and the new, purportedly infringing, work. In Dave Grossman Designs v. Bortin, the court said that:

> "The law of copyright is clear that only specific expressions of an idea may be copyrighted, that other parties may copy that idea, but that other parties may not copy that specific expression of the idea or portions thereof. For example, Picasso may be entitled to a copyright on his portrait of three women painted in his Cubist motif. Any artist, however, may paint a picture of any subject in the Cubist motif, including a portrait of three women, and not violate Picasso's copyright so long as the second artist does not substantially copy Picasso's specific expression of his idea."

https://www.thelegalartist.com/blog/you-cant-copyright-style

>>insani+N4
>That can't possibly be a valid claim, right?

For literally everything but music, yes.

Even by the standards of copyright technicality, music copyright is weird. For example, if you ask a lawyer[0] what parts of copyright set it apart from other forms of property law[1], they would probably answer that it's federally preempted[2] and that it has constitutionally-mandated term limits.

Which, of course, is why music has a second "recording copyright", which was originally created by states assigning perpetual copyright to sound recordings. I wish I was making this up.

So the musical arrangement that constitutes that song from 1640? Absolutely public domain. You can tell people how to play Monteverdi all damned day. But every time you record that song being played, that creates a new copyright on that recording only. This is analogous to how making a cartoon of a public-domain fairy tale gives you ownership over that cartoon only. Except because different performers are all trying to play the same music as perfectly as possible, the recordings will sound the same and trip a Content ID match.

Oh, and because music copyright has two souls, the Sixth Circuit said there's no de minimus for sampling. That's why sample-happy rap is dead.

If you want public domain music on your YouTube video you either record it yourself or license a recording someone else did. I think there are CC recordings of PD music but I'm not sure. Either way you'll also need to repeatedly prove this to YouTube staff that would much rather not have to defend you against a music industry that's been out for blood for half a century at this point.

[0] Who, BTW, I am very much NOT

[1] Yes, yes, I know I'm dangerously close to uttering the dangerous propaganda term "intellectual property". You can go back to bed Mr. Stallman.

[2] Which means states can't make their own supra-federal copyright law and any copyright suit immediately goes to federal court.

>>naillo+m2
Laws in which nation and enforced by which juries?

>>rtkwe+En
I think it's more a question of derivative work. Normally derivative work is an infringement unless it falls under fair use.

Now a human can take inspiration from like 100 different sources and probably end up with something that no one would recognize as derivative to any of them. But it also wouldn't be obvious that the human did that.

But with an ML model, it's clearly a derivative in that the learned function is mathematically derived from its dataset and so is all the resulting outputs.

I think this brings a new question though. Because till now derivative was kind of implied that the output was recognizable as being derived.

With AI, you can tweak it so the output doesn't end up being easily recognizable as derived, but we know it's still derived.

Personally I think what really matters is more a question of what should be the legal framework around it. How do we balance the interests of AI companies and that of developers, artists, citizens who are the authors of the dataset that enabled the AI to exist. And what right should each party be given?

replies(1): >>rtkwe+jX1

>>CapsAd+fr
This is just nonsense.

It's similar to saying that any digital representation of an image isn't an image just a dataset that represent it.

If what you said was any sort of defense every image copyright would never apply to any digital image, because the images can be saved in different resolutions, different file formats, or encoded down. e.g. if a jpeg 'image' was only an image at an exact set of digital bits i could save it again with a different quality setting and end up with a different set of digital bits.

But everyone still recognises when an image looks the same, and courts will uphold copyright claims regardless of the digital encoding of an image. So goodluck with that spurious argument that it's not copyright because 'its on the internet (oh its with AI etc).

replies(1): >>CapsAd+0E

>>willia+Ns
Imagine telling someone who wanted to learn a sport to watch it. I define someone that writes as a writer. It is the act of writing that enables you to then read and learn from others.

An example: a dyslexic friend and a dyslexic family member: their writing communication skills of both is now fine in part because their jobs required it from them (and in part because technology helps). I also had one illiterate friend, who has taught himself to read and write as an adult (basic written communication), due to the needs of his job. Learn by doing, and add observation of others as an adjunct to help you. Even better if you can get good coaching (which requires effort at your craft or sport).

Disclaimer: never a writer. Projecting from my other crafts/sports. Terribly written comment!

replies(1): >>ghowar+jE

>>Fillig+mt
If it faithfully memorized and reproduced a set of watermarks, it would be premature to conclude that it hadn’t memorized other (non-generic) graphical elements.

>>kmeist+Ax
> The reason why it's easy to match Copilot results back to the original source is that the users are starting with prompts that match their public code, deliberately to cause prompt regurgitation.

The reason doesn't really matter...

replies(1): >>lofatd+lD

>>Americ+Ai
I think the issue people have is that every developer trying to implement FizzBuzz will not have studied all the existing public copyrighted implementations. They will likely be reinventing the solution with maybe never having seen an existing FizzBuzz implementation or having only seen one or two at most, and probably won't be re-implementing it verbatim.

But the machine learning model has studied every single one of them.

And maybe more preposterous, if its dataset had no FizzBuzz implementation would it even be able to re-invent it?

I feel this is the big distinction that probably annoys people.

That and the general fact that everyone is worried it'll devalue the worth of an experienced developer as AI will make hard thing easier, require less effort and talent to learn and thus making developers less high demand and probably lower paid.

>>kweing+(OP)
For what it's worth, I think it's all very impressive and amazing but also really sketchy. Or at least, I think the developers of these systems need to be very careful about what they are allowed to do with what content, and I don't trust that they are doing that, because of articles like this one and others.

>>kmeist+Sx
If I ever take any code from SO, I include a comment with a link to it. Surely that's standard practice for anything longer than a line or two?

replies(1): >>fourth+EM

>>mattkr+Kz
Even if it is an expressive choice of the new artist, if enough of the original artist's expressive choice remains, it could still be a copyright violation. Fair use can sometimes be a defense, but there are a lot of factors that go into determining whether something is fair use.

>>mattkr+Kz
Really? It looks like some bad Warhol take on the Vermeer original.

replies(1): >>mattkr+nM

>>Fillig+Kt
Why? Obviously it wouldn't be a copyright violation because the original one is old enough to no longer by copyrighted. But other than age?

replies(1): >>atchoo+OT

>>nl+Dz
The tool is already redistributing it.

A broadcaster of copyrighted works is not protected against infringement just because they expect viewers to only watch programming they own.

replies(1): >>rtkwe+TA1

>>kweing+(OP)
I know much less programmers offended by copilot than artists offended by StableDiffusion.

This is a mostly irrelevant red herring setting up professions against each other. Instead we should cooperate on a costly yet necessary decision of instituting a basic income, especially prioritizing professions about to be superseded by modern ML.

Obviously, our decision-making class views the topic of instituting a realistic basic income right now as something extremely unpleasant, and so it goes.

People who helped to bootstrap the AI should be compensated, at the very least by being able to live a modest lifestyle without having to work. Simple as.

>>insani+s5
If you sing over a song you’re adding your own voice. If you photograph a building that’s your own photograph, where decisions like lighting and framing are creative choices. If you paint a picture of a building that’s your own picture.

An artist should credit when they are directly taking from another artist. Erasure poems don’t quite work if the poet runs around claiming they created the poem that was being erased.

But more importantly SD allows you to take and use existing copyright works and funny-launder them and pass them off as your own, even though you don’t own the rights to that work. This would be more akin to I take a photograph you made and sell it on a t shirt on red bubble. I don’t actually own the IP to do that with.

>>nl+Dz
Just like it's the person's responsibility to only recombine jpeg basis states when they don't correspond to a copyrighted image? It seems more and more to be the case that the trained model is, in large part, a very compact representation of the training data. I'm not seeing a difference between distributing a model that can be used to reconstruct the input images, as opposed to distributing jpeg basis states that can be used to reconstruct the original image.

>>kmeist+Ax
> The reason why it's easy to match Copilot results back to the original source is that the users are starting with prompts that match their public code, deliberately to cause prompt regurgitation.

Sounds like MS has devised a massive automated code laundering racket.

replies(1): >>ISL+bF

>>kmeist+Ax
I think that's backwards. The AI doesn't "forget", it never even knew what novelty is in the first place.

>>xani_+dB
GP is just highlighting why this is so common and often a challenging edge case. If you ask it for something that's exactly in its dataset, the "best" solution that minimizes loss will be that existing code. Thus, it's somewhat intrinsic to applying statistical learning to text completion.

This means MS really shouldn't have used copyleft code at all, and really shouldn't be selling copilot in this state, but "luckily" for them, short of a class action suit I don't really see any recourse for the programmers who's work they're reselling.

replies(2): >>fweime+VL >>kmeist+A23

>>cercat+Kw
It’s not hypocrisy to think some jobs shouldn’t be automated. I don’t teach, but I definitely want human teachers teaching my kin, not AI teachers.

replies(1): >>cercat+EG

>>kmeist+Ax
I tried some very simple queries with copilot on random stuff, and tried to trace it back to the source. I was successful about 1/3 of the time.

(Sorry I didn't log my experiment results at the time. None of it was related to work I'd done - I used time adjustment functions if I remember correctly)

>>hacker+xA
I don't understand what is nonsense, how it works? Your response seems to be for something entirely different.

But anyway, how I see stable diffusion being different is that it's a tool to generate all sorts of images, including copyrighted images.

It's more like a database of *how to* generate images rather than a database *of* images. Maybe there isn't that much of a difference when it comes to copyright law. If you ask an artist to draw a copyrighted image for you, who should be in trouble? I'd say the person asking most of the time, but in this case we argue it's the people behind the pencil or whatever. Why? Because it's too easy? Where does a service like fiver stand here?

So if a tool is able to generate something that looks indistinguishable from some copyrighted artwork, is it infringing on copyright? I can get on board with yes if it was trained on that copyrighted artwork, but otherwise I'm not so sure.

replies(1): >>rfrec0+6L

>>roboca+4B
Coming from an actual, though unpublished, writer: you are right.

>>Fillig+mt
The watermark of a stock photo service is usually copyright protected, and also a (usually registered) trademark.

>>BeFlat+Vz
I don’t think anyone believes it’s possible. Even the “ai ethicists” have to realise this. We can still acknowledge that these tools can be bad for society while knowing they can’t be stopped.

>>epolan+Lh
Not the OP but I have a sinking feeling that these AI tools are going to take away from the most enjoyable careers and creative pursuits and leave us with only mundane button pusher AI supervisor jobs.

Current AI is not replacing anything yet but I feel we are only a few years before AI can do a better job at drawing or programming than someone with years of practice. Sure, you can utilise those tools to stay ahead. But will AI prompt engineer be as emotionally satisfying as drawing for real?

replies(1): >>woah+SF

>>thetea+4D
Seems more like a massive class-action copyright target, potentially at ($50k/infraction) x (the number of usages).

replies(2): >>bugfix+jN >>thetea+ia1

>>Fillig+Kt
If the original art is still copyrighted, and you’d start selling your hand drawn variation, you’d totally be violating the copyright.

To make it concrete, imagine the latest Disney movie poster. You redraw it 95% close to the original, just changing the actual title. Then you sell your poster on Amazon at half the price of the actual poster. Would you get a copyright strike ?

>>kweing+(OP)
Does it cost money to produce high quality training sets? Yes. Would an organization or individual be willing to pay for samples for their data set? Absolutely. It seems pretty easy to discern that value is being taken from people.

>>Gigach+RE
It seems like these AI tools, if anything, will take away the least enjoyable parts of creative careers. Artists will less time thinking about how to adjust a camera lens or mix paints, and more time thinking about how to tell a story that connects with people. Programmers will spend less time banging out boilerplate, and more time thinking about system design.

replies(1): >>Gigach+vk1

>>kweing+(OP)
The last time I happened to point this out[1], all I got was a bunch of HNers nitpicking the words I chose, but not addressing the core issue.

I have to assume this is just people being protective of their own profession and consequently, setting up a high bar for what constitutes as performance in that profession.

[1] https://news.ycombinator.com/item?id=32895251#32895709

>>Quantu+6j
Interesting theory. I'll have to test that.

Oh, actually I remember now -- I think the copyright complaint specifically said what recording they thought I was infringing, and it was the correct piece.

>>ironma+xD
Perhaps, or perhaps not. We have not yet seen the true reach of pedagogy of AI. If AI can teach better than humans (something like the Matrix's brain uploading of training), then I will want to do that than have a human teach me.

>>rtkwe+o8
I suspect it’s going to be a discussion similar to the introduction of music sampling, followed by a lot of litigation, followed by a settling of law on the matter.

The interesting part is if AI will be considered a tooling mechanism much like the tooling used to record and manipulate a music sample into a new composition.

>>rtkwe+gn
Have you been following the Andy Warhol Prince drawing case?

It is current at the SCOTUS so we should see a ruling for the USA sometime in the next year or so.

https://en.m.wikipedia.org/wiki/Andy_Warhol_Foundation_for_t...

replies(1): >>rtkwe+KC1

>>Spivak+Ak
Right, but what if you commission an artist to create a work similar to an already existing piece of art and the artist decides that the most efficient way to do that is to just place the original piece of art in a photocopier, crops out the copyright notice and original artist's signature, and sells you the resulting print?

replies(1): >>rtkwe+pJ1

>>CapsAd+0E
A tool can't be held accountable and can't infringe on copyright or any other law for that matter. It's more of a product. It seems to me like it's a gray area that's just going to have to be decided in court. Like did the company that sells the tool that can very easily be used to do illegal things take enough reasonable measures to prevent it from being accidently used in such a way? In the case of Copilot, I don't believe so, because there aren't really even any adequate warnings to the end user that say it can produce code which can only legally be used in software that meets the criteria of the original license.

replies(2): >>omnimu+wd1 >>sidewn+IM1

>>tremon+pj
You can produce a public domain work using content that you have fair use rights to. The original owner of the content you are using fairly has no claim of ownership. You would have to assert that right in court if the owner of the copyright came after you, but that does not preclude the possibility of making a public-domain work with other copyrights used in fair use.

Obviously, that would not entitle anyone to rip those elements from your work and use them in a way that was not fair use. The Getty watermark could fall into this category: public domain pictures using the watermark fairly (for transformative commentary/satire purposes) could go into the network, which uses that information to produce infringing images.

Trademarks are a different story, but trademark protections are a lot narrower than you might think.

The point is that it's very conceivable that the neural network is being trained to infringe copyrights by training entirely with public-domain images.

>>insani+N4
The copyright on the original composition may be expired, but there are many people who make new recordings of that piece and those recordings are copyrighted. While it is entirely legal to record your own rendition, YouTube’s automated Content ID is dumb and often can’t tell the difference between your recording and some other contemporary recording.

>>kweing+(OP)
There's a substantial difference between being trained and being overfit to repeat training data 1:1. Overtraining is a bug of a model, not a feature. For example, Stable Diffusion 1.4 is overtrained on one specific Aivazovsky painting (among some others by other authors, like Mona Lisa, or Sunflowers - Van Gogh painted several of those). Copilot was famously overtrained on Carmack's fast square root code, so they had to block it programmatically after receiving bad publicity. Both are not intended by model authors, this is a flaw.

>>lofatd+lD
Pretty much all code they have requires attribution, and based on reports, Copilot does not generate that along with the code. So excluding copyleft code (how would you even do that?) does not address the issue (assuming that the source code produced is actually a derivative work).

replies(1): >>lofatd+GP

>>boulos+TB
That’s a really apt comparison, since the Supreme Court just heard Andy Warhol Foundation for the Visual Arts v. Goldsmith, which hinges on whether Warhol’s use of a copyrighted photo of Prince as the basis for “Orange Prince” was Fair Use.

Warhol’s estate seems likely to lose and their strongest argument is that Warhol took a documentary photo and transformed it into a commentary on celebrity culture. Here, I don’t even see that applying: it just looks like a bad copy.

https://www.scotusblog.com/2022/10/justices-debate-whether-w...

>>bscphi+JB
I do the same. I think it satisfies BY (attribution) but not SA (Share Alike).

As GP says, no one really cares, but it seems hard to satisfy SA... even if you are pasting into open source, is your license compatible with CC?

Perhaps I'm over-thinking this.

>>datafl+Rz
So your claim is that the code in the OP tweet is actually not copyrightable, and it would only become a copyright violation if you also copied many additional code blocks from the same copyrighted work?

>>ISL+bF
Good. Where do I sign up?

replies(1): >>ISL+UF3

>>dawner+52
This is like saying "we need a regulation around bugs in software", with similar consequences. ML models are generally too large to ensure that there's no bugs. Same with software.

>>ghowar+4q
This isn't super convincing to me. You're basically predicting that some new innovation will be limited in its usefulness, but you have no real way of knowing that, because the variables are too complex.

This is why we have a market. We let billions of individuals vote on what they think is useful or not, in real-time, multiple times a day, every day. If AI-generated images are less desirable than what came before, then people won't use them or pay to use them in the long run. They'll die like other flash-in-the-pan fads have died, artists will retain their jobs en masse, and OpenAI won't gain much if any power.

The entire idea of the market is to ensure that if some entity is gaining money/power, that's happening as a result of it providing some commensurate good to the people. And if that's not happening, or if the power is too great, that's why we have laws and regulatory bodies.

>>fweime+VL
That's a good point. I was thinking that during the curation phase of the dataset they should check for a LICENSE.txt file in the repo, and just batch exclude all copyleft/copyright containing repositories. This obviously won't handle every case as you say, and when it does generate copyleft code it will fail to attribute, but hopefully not having copyleft code in its dataset or less of it reduces the chance it generates code that perfectly satisfies its loss function by being exactly like something its seen before.

The main problem I see with generating attribution is that the algorithm obviously doesn't "know" that it's generating identical code. Even in the original twitter post, the algorithm makes subtle and essentially semantically synonymous changes (like the changing the commenting style). So for all intents and purposes it can't attribute the function because it doesn't know _where_ it's coming from and copied code is indistinguishable from de novo code. Copilot will probably never be able to attribute code short of exhaustively checking the outputs using some symbolical approach against a database of copyleft/copyrighted code.

>>willia+Y6
> I'm not sure sure that originality is that different between a human and a neural network.

It is, yes. For example, a neural network can't invent a new art style on its own, or at least existing models can't, they can only copy existing art styles, invented by humans.

>>Thorre+WB
The photograph of the art, which will be more recent, might have copyright protections.

It looks like it wouldn't in the UK, probably wouldn't in the US but would in Germany. The cases seem to hinge on the level of intellectual creativity of the photograph involved. The UK said that trying to create an exact copy was not an original endeavour whereas Germany said the task of exact replication requires intellectual/technical effort of it's own merit.

https://www.theipmatters.com/post/are-photographs-of-public-...

>>stale2+4w
> That sounds like a you problem, not a us problem.

This stance allows me to do whatever do I want with any software or work you put out there, regardless of the license you attach to it, since it's your problem, not mine.

However, this is not the mode I operate ethically.

> As of yet, no court has said that any of this is illegal.

I assume this will be tested somehow, sometime. So I'm investing in popcorn futures.

> Your objection simply doesn't matter, until there is a court case that supports you. You can't do anything about it, if that doesn't happen.

You know, this goes both ways. Same will be very valid for your works, through your own reasoning.

replies(1): >>stale2+jW

>>psychp+Bh
> It's funny to say id's fast inverse square root. Conway certainly didn't come up with the algorithm or the magic number.

I'm not claiming that they did. What I said is, Copilot emitted the exact implementation in IDs repository, incl. all comments and everything.

> But your reasoning boils down to I don't like it so it mustn't be that way. That's never been necessarily true.

If you interpret my comment with that depth and breadth, I can only say that you are misinterpreting completely. It's not about my personal tastes, it's about ethical frameworks and social contracts.

> At any rate piracy is rampant so clearly a large body of people don't think even a direct copies is morally wrong. Let alone something similar.

I believe if you listen to a street musician for a minute, you owe them a dollar. Scale up from there. BTW, I'm a former orchestra player, so I know what making and performing music entails.

> You're acting as though there are constant won and lost cases over plagiarism. Ed Sheeran seems to defend his work weekly. Every case that goes to court means reasonable minds differ on the interpretation of plagiarism legally.

When there's a strict license on how a work can be used, and the license is violated, it's a clear case. That AI is just a derivation engine, and the license that derivations carry the same license. I don't care if you derive my code. I care you derive my code and hide the derivations from public.

It's funny that you're defending close-souring free software at this point. This is a neat full-circle.

> So what's your point?

All research and science should be ethical. AI research is not something special which allows these ethical norms and guidelines (which are established over decades if not centuries) to be suspended. If medicine people act with quarter of this lassiez faire attitude, they'd be executed with a slow death. If security researchers act with eighth of this recklessness, their career are ruined.

> That's all you got to hold back the tide of AI?

As I aforementioned, I'm not against AI. It just doesn't excite me as a person who knows how it works and what it does, and the researchers' attitude is leaving a bad taste in my mouth.

>>bayind+CU
> this stance allows me to do whatever do I want with any software

Actually, no it doesn't. This topic is about AI training on code.

Courts have not held that this is illegal.

But there are absolutely other things, that people might do with code, that break copyright law.

> it's your problem, not mine.

Oh, but it would be your problem as well, if you break the law, and someone else sues you for it.

That's the difference. AI training is not against the law. Other things, that you are imagining in your head right now, very well could be, and you could lose.

> Same will be very valid for your works

Not if what you are hypothetically doing breaks the law, and AI training doesn't break the law.

So that the difference, which makes the reasoning legitimate.

replies(1): >>bayind+oY

>>Aeolun+tq
> That’s actually fine...

Actually yes. I'm not against the tech. I'm against using my code without consent for a tool which allows to breach the license I put my code under.

IOW, if Copilot understood code licenses and prevented intermixing incompatibly licensed code while emitting results for my repository, I might have slightly different stance on the issue.

>>ghowar+ug
>The way to tell if something was in the dataset would be to use the name of a famous Disney character and see what it pulls up.

I tried out of curiosity. Here[1] are the first 8 images that came up with the prompt "Disney mickey mouse" using the stable diffusion V1.4 model. Personally I don't really see why Disney or any other company would take issue with the image generation models, it just seems more or less like regular fan art.

[1]: https://i.imgur.com/cIHBCRe.png

>>stale2+jW
> Courts have not held that this is illegal.

Laws are just codified version of ethics. Just because it's not codified in law, it doesn't mean it's ethically correct, and I hold ethics over laws. Some people call this conscience, others call this honor.

Just because it's not deemed illegal, it's not deemed ethical. These are different things. The world has worked under honor and ethical codes for a very long time, and still works under these unwritten laws in a lot of areas.

Science, software and other frontiers value ethics and principles a great deal. Some niches like AI largely ignore these, and I find this disturbing.

However, some people prefer to play the game with the written rules only, and as I said, I'm investing in popcorn futures to see what's gonna happen to them.

I might tank and go bankrupt of course, but I will sleep better at night for sure, and this is more important for me at the end.

I'm passionate about computers, yes. This is also my job, yes, but I'm not the person who'll do reckless things just because an incomplete code of written ethics doesn't prevent me to do it.

I'd rather not do anything to anyone which I don't want to receive. IOW, I sow only the seeds which I want to reap.

replies(1): >>stale2+2t1

>>ISL+bF
Both.

>>rfrec0+6L
The issue is not about what it produces. Copilot i am sure has safeguards to not output copyrighted code (they even mention they have tests). So it will sufficiently change the code to be legally safe.

The issue is in how it creates the output. Both Dalle and Copilot can work only by taking work of people in past, sucking up their earned know how and creations and remixing it. All that while not crediting (or paying) anyone. The software itself might be great but it only works because it was fed with loads of quality material.

It's smart copy&paste with obfuscation. If thats ok legally. You can imagine soon it could be used to rewrite whole codebases while avoiding any copyright. All the code will technically be different, but also the same.

>>woah+SF
AI isn’t replacing camera usage or paint mixing. It’s replacing the years of learning to draw with prompt engineering. Like how the camera obsoleted the art of realistic painting, I feel that AI will replace the art of drawing. Which is something I like doing. But I like doing it because it’s hard but with purpose.

It’s not satisfying to painstakingly work on something that I could have generated with an AI in seconds.

>>bayind+oY
> Laws are just codified version of ethics.

And a quite reasonable code of ethics is thst people do not have absolute, complete control over their intellectual property, and instead only have the ability to control it in certain circumstances.

Things like fair use, which makes this legal, exists for many very good reasons.

So yes, the code of ethics that society has decided on, includes perfectly reasonable exception, such as fair use, and it is your problem, not ours, that you have some ridiculous idea that people should have complete, 100% authoritarian control over their IP.

And no, people not having infinite control over IP, does not allow you to extend this reasonable exception, to you being able to do literally anything to other people's IP.

replies(1): >>bayind+Ww1

>>stale2+2t1
You're completely right. My premise is not extending the (court tested, honored) license I attach to my code.

What I say with the GPL license is clear:

If you derive anything from this code base, you're agreeing and obliged to carry this license to the target code base (The logical unit in this case is a function in most cases).

So the case is clear. AI is a derivation engine. What you obtain is a derivation of my GPL licensed code. Carry the license, or don't use that snippet, or in AI's case, do not emit GPL derived code for non-GPL code bases.

This is all within the accepted ethics & law. Moreover, it's court tested too.

replies(1): >>stale2+Qy1

>>bayind+Ww1
> you're agreeing and obliged to carry

People are not agreeing though.

They are not agreeing, because there is a perfectly reasonable ethical and legal principle called fair use, which society has determined allows people to engage in limited use of other people's IP, no matter what the license says.

> Carry the license, or don't use

Or, instead of that, people could reasonably use fair use, and ignore the license, as fair use exists for many good legal and ethical reasons.

And no, you do not get to extend that out, to doing anything you want to do, just because there is a reasonable exception called fair use.

> do not emit GPL derived code for non-GPL code bases

Or, actually, yes do this. This is allowed because of the reasonable ethical and moral principle called fair use, which allows people to ignore your license.

replies(1): >>bayind+BD1

>>behrin+nC
It's not broadcasting an exact replica though, it's instructions to recreate an approximation of the original image. If I look at an image describe it and have someone else or even myself recreate it later that in general isn't copyright infringement, that's just a normal process in art. A more extreme example is the Fairy Hope image and the original AP but even that is more similar to the original than the output created by stable diffusion. Approximate recreations aren't generally copyright violations.

On the subject of trademarks the issue is as far as I know even more on the end user because the protections on them is around use in commerce and consumer confusion not about just recreating them like copyright protections.

>>blende+fJ
No hadn't heard of it, I don't follow copyright law extremely closely it tends to make me annoyed. On it's face reading the case summaries and looking at the two pictures, it feels like the act of manually repainting and the color choices should be enough to render it a transformative work. It's one of the fundamental problems with trying to apply copyright to anything other than precise copies, art remixes and recombines all the time, it's fundamental to the process.

>>stale2+Qy1
I will agree to disagree on your overly broad definition of fair-use which consists of ingesting a whole code base and using its significant parts for another code base with or without derivation while disregarding the attached license to its whole and/or parts.

Thanks for the discussion, and have a nice day.

I may not further comment on this thread from this point.

>>kweing+(OP)
I must be in the minority of programmers that I really really like Copilot but am indifferent about Stable Diffusion/Dall-E/midJourney.

Copilot on Python makes me x5 more productive. I used Copilot in Beta for a year and continue paying for it now.

For example: I can make a command line data wrangling script for a novel data set in a few minutes with a few prompts with full complement of extras (proper argparse parameters with sane defaults, ready to import etc etc). # reasonable comments included for free as well

Before copilot I could do the same in about 20-30minutes but my code would be a mess with little commenting. I would spend 30-60 minutes just looking up docs for various libraries.

Now without Copilot, if all I was doing was writing data wrangling scripts 4 hours a day I could approach this Copilot like productivity for a single task.

However with Copilot I can switch problem domains very quickly and remain productive.

Interestingly, on something like CSS or Javascript - Copilot is helping only slightly, maybe because my local training set is insufficient and my web-dev prompts are too generic.

So I think AI can be fantastic force multiplier in a skillset that you already are reasonably familiarity. I can handle the 5-10% wtf Python code that Copilot produces.

I do not particularly like copyrights and do wish Copilot had been trained on private Microsoft code as well.

>>keving+Si
That's clearly lifting style, pose and general location but in each of those there are changes. Even for the original art we could find tons of examples of very similar poses and backgrounds because anime girl in a bathing suit on a beach background isn't that original of an image at the concept level. That pose also is a pretty well worn.

This is the problem of applying the idea of ownership to ideas and expression like art. Art in particular is a very remix and recombination driven field.

replies(1): >>keving+iG4

>>throwa+Md
> You'll never be able to push it to the fringes because there will never be a legal universal agreement even from country to country on where to draw the line.

Huh? "Legal universal agreement" has never been required to push something to the fringes in a particular country.

If (in the US) these models were declared to be copyright infringement, or the users were required to pay license feeds to the creators of the data that was used to build the models, they will vanish from the public sphere. GitHub/Microsoft's legal department will pull Copilot down immediately, and development will effectively cease. No US company will sponsor development, and no company will allow in-house use. It will be dead.

Some dude might still run the model in his bedroom in his spare time on his own hardware, but that's what irrelevance looks like.

> And as computers get more powerful and the models get more efficient it'll become easier and easier to self host and run them on your own dime. There are already one click installers for generative models such as stable diffusion that run on modest hardware from a few years back.

If that's the only way you can run something, because it's illegal, you're describing a fringe technology right there.

>>faerie+ub
> Cyberpunk sure, but fantasy? Not at all.

The fantasy is the idea that doing what you describe will matter.

>>rfrec0+bK
That's a violation but not what SD is doing. It's not copying it's recreating a similar (sometimes extremely similar image).

>>rfrec0+6L
The DMCA disagrees. Specific methods of "circumvention" which inevitably take the form of a software tool are prohibited. Tools and their authors can be held accountable.

>>matkon+Fk
But it means the models were trained on images that are under copyright. In fact many of these models were trained exclusively on such images without any permission. For example Midjourney is clearly trained on everything on artstation.com where almost all images have commercial purpose / licenses.

>>didibu+sA
The real kink in that application of derivative work to me is the entire dataset goes into the model and is to some vanishingly small extent is used in every output how can we meaningfully assign ownership through that transition and mixing. And when we do how do we do it without exacerbating the extant problem of copyright in art? We already can't use characters and settings made during out own lifetimes in our own expression because Disney got life + 70 through Congress.

>>lofatd+lD
Suing Microsoft for training Copilot on your code would require jumping over the same hurdle that the Authors Guild could not: i.e. that it is fair use to scan a massive corpus[0] of texts (or images) in order to search through them.

My real worry is downstream infringement risk, since fair use is non-transitive. Microsoft can legally provide you a code generator AI, but you cannot legally use regurgitated training set output[1]. GitHub Copilot is creating all sorts of opportunities to put your project in legal jeopardy and Microsoft is being kind of irresponsible with how they market it.

[0] Note that we're assuming published work. Doing the exact same thing Microsoft did, but on unpublished work (say, for irony's sake, the NT kernel source code) might actually not be fair use.

[1] This may give rise to some novel inducement claims, but the irony of anyone in the FOSS community relying on MGM v. Grokster to enforce the GPL is palpable.

>>bugfix+jN
Find a good and ambitious copyright attorney with some free capacity.

Also, register your code with the copyright office.

Edit: Apparently, with the #1 post on HN right now, you could also just go here: https://githubcopilotinvestigation.com/

>>Americ+mm
Google v. Oracle ended with a six line function not being granted de minimus protection. What you're talking about is arguably common sense, but not based on current case law in the US.

>>heavys+ff
Ah! You're right.

>>Shamel+rl
I mean, I don't have to do it alone. There's this thing called "the law" that's put in a little work on this :)

replies(1): >>Shamel+5p7

>>rtkwe+iE1
I think the key detail is to look at what happened in the bottom left - in the original drawing, there's dark blue (due to lighting) cloth filling the scene, but the network has instead generated oddly-hued water there, even though on the right side there's sand from the beach shore. There's seemingly no geometric representation driving the AI so it ended up turning clothing into mystery ocean water when synthesizing an image that (for whatever reason) looked like the original one. It's an interesting error to me because it only looks Wrong once you notice the sand on the right.

>>rtkwe+gn
It is a clear case of derivative work (see also https://commons.wikimedia.org/wiki/Commons:Derivative_works - internal docs, but their explanation of copyright status tends to be well done)

>>Fillig+Kt
It is a clear case of derivative work (see also https://commons.wikimedia.org/wiki/Commons:Derivative_works - internal docs, but their explanation of copyright status tends to be well done)

This specific one would not be a problem, but doing it with a still copyrighted work would be.

>>jrm4+e14
Fair enough; from a legal framework that's a highly practical way to move forward. I don't think many people like to point at the current status quo, with all its flaws, and immediately think "the courts will help me!"; but you did mention you are a lawyer and I respect that you are working in the bounds of what is possible rather than what is ideal.