zlacker

[parent] [thread] 90 comments
1. rtkwe+(OP)[view] [source] 2022-10-16 21:45:01
I think copilot is a clearer copyright violation than any of the stable diffusion projects though because code has a much narrower band of expression than images. It's really easy to look at the output of CoPilot and match it back to the original source and say these are the same. With stable diffusion it's much closer to someone remixing and aping the images than it is reproducing originals.

I haven't been following super closely but I don't know of any claims or examples where input images were recreated to a significant degree by stable diffusion.

replies(9): >>makeit+q8 >>mr_toa+49 >>Americ+ca >>paulgb+hb >>pavlov+Uc >>DannyB+ve >>bigiai+oj >>kmeist+cp >>dv_dt+vA
2. makeit+q8[view] [source] 2022-10-16 23:01:38
>>rtkwe+(OP)
I think the is exacty the gap the gp is mentionning: to a trained artist it is clear as water that the original image has been lifted wholesale, even if for instance the colors are adjusted here and there.

You put it as a remix, but remixes are credited and expressed as such.

replies(2): >>jzb+l9 >>omnimu+X9
3. mr_toa+49[view] [source] 2022-10-16 23:07:35
>>rtkwe+(OP)
> I haven't been following super closely but I don't know of any claims or examples where input images were recreated to a significant degree by stable diffusion.

I think that the argument being made by some artists is that the training process itself violates copyright just by using the training data.

That’s quite different from arguing that the output violates copyright, which is what the tweet in this case was about.

replies(1): >>rtkwe+gf
◧◩
4. jzb+l9[view] [source] [discussion] 2022-10-16 23:09:13
>>makeit+q8
I haven’t seen any side by sides that seem like a lift. Any examples?

I don’t see Midjourney (et al) as remixes, myself. More like “inspired by.”

replies(3): >>omnimu+ma >>keving+ua >>matkon+gc
◧◩
5. omnimu+X9[view] [source] [discussion] 2022-10-16 23:14:16
>>makeit+q8
Exactly to a programmer copilot is clear violation, to a writer gpt-3 is clear violation, to an artist dalle-2 is clear violation. The artist might love copilot, the writer might love dalle, the programmer might love gpt-3.

Its all the same they just dont realize this.

replies(1): >>sidewn+5g
6. Americ+ca[view] [source] 2022-10-16 23:16:22
>>rtkwe+(OP)
I don’t think copilot is intrinsically a copyright violation, as you seem to be alluding to. Examples like this seem to be more controversial, but I’m not sure there’s a clear copyright violation there either.

If you asked every developer on earth to implement FizzBuzz, how many actually different implementations would you get? Probably not very many. Who should own the copyright for each of them? Would the outcome be different for any other product feature? If you asked every dev on earth to write a function that checked a JWT claim, how many of them would be more or less exactly the same? Would that be a copyright violation? I hope the courts answer some of these questions one day.

replies(4): >>stuart+Ca >>datafl+4b >>reacha+wk >>didibu+Xs
◧◩◪
7. omnimu+ma[view] [source] [discussion] 2022-10-16 23:18:08
>>jzb+l9
Its clear where the knowhow was lifted from it doesnt matter that if the final image is somewhat unique (almost every image is).
replies(1): >>matkon+hc
◧◩◪
8. keving+ua[view] [source] [discussion] 2022-10-16 23:18:57
>>jzb+l9
Not safe for work, but one example I saw going around:

https://twitter.com/ebkim00/status/1579485164442648577

Not sure if this was fed the original image as an input or not.

Also seen a couple cases where people explicitly trained a network to imitate an artist's work, like the deceased Kim Jung Gi.

replies(2): >>lbotos+7c >>rtkwe+Uv1
◧◩
9. stuart+Ca[view] [source] [discussion] 2022-10-16 23:20:25
>>Americ+ca
> If you asked every developer on earth to implement FizzBuzz, how many actually different implementations would you get?

Thousands at least. Some of which would actually work.

replies(1): >>Americ+db
◧◩
10. datafl+4b[view] [source] [discussion] 2022-10-16 23:23:54
>>Americ+ca
> If you asked every developer on earth to implement FizzBuzz, how many actually different implementations would you get?

Does it matter? If you examined every copyright lawsuit on earth over code, how many of them would actually be over FizzBuzz?

replies(1): >>Americ+ib
◧◩◪
11. Americ+db[view] [source] [discussion] 2022-10-16 23:24:33
>>stuart+Ca
There’s a finite number of ways to implement a working FizzBuzz (or anything else) in any given language, that aren’t substantially similar, is my point. At least without introducing pointless code for the explicit purpose of making it look different.
12. paulgb+hb[view] [source] 2022-10-16 23:25:16
>>rtkwe+(OP)
I don’t know of any examples of images being wholly recreated, but it’s certainly possible to use the name of some living artists to get work in their style. In those cases, it seems like not such a leap to say that the AI has obviously seen that artist’s work and that the output is a derivative work. (The obvious counterargument is that this is the same as a human looking at an artist’s work and aping the style.)
replies(3): >>matkon+bc >>Spivak+cc >>nl+zr
◧◩◪
13. Americ+ib[view] [source] [discussion] 2022-10-16 23:25:36
>>datafl+4b
The same rationale applies to any other simple code block, as I elaborated on.
replies(1): >>datafl+yb
◧◩◪◨
14. datafl+yb[view] [source] [discussion] 2022-10-16 23:29:34
>>Americ+ib
And my point is you don't have lawsuits over one simple code block.
replies(1): >>Americ+6c
◧◩◪◨⬒
15. Americ+6c[view] [source] [discussion] 2022-10-16 23:37:05
>>datafl+yb
This entire thread is about how copilot committed a copyright violation on a simple code block.
replies(1): >>datafl+1d
◧◩◪◨
16. lbotos+7c[view] [source] [discussion] 2022-10-16 23:37:10
>>keving+ua
It's really interesting. I suspect the face was inpainted in, or this was a "img2img".

I think over time we are going to see the following:

- If you take say a star wars poster, and inpaint in a trained face over luke's, and sell that to people as a service, you will probably be approached for copyright and trademark infringement.

- If you are doing the above with a satirical take, you might be able to claim fair use.

- If you are using AI as a "collage generator" to smash together a ton of prompts into a "unique" piece, you may be safe from infringement but you are taking a risk as you don't know what % of source material your new work contains. I'd like to imagine if you inpaint in say 20 details with various sub-prompts that you are getting "safer".

replies(1): >>numpad+lj
◧◩
17. matkon+bc[view] [source] [discussion] 2022-10-16 23:37:45
>>paulgb+hb
https://alexanderwales.com/wp-content/uploads/2022/08/image....

Left: “Girl with a Pearl Earring, by Johannes Vermeer” by Stable Diffusion Right: Girl with a Pearl Earring by Johannes Vermeer

This specific one is not copyright violation as it is old enough for copyright to expire. But the same may happen with other images.

from https://alexanderwales.com/the-ai-art-apocalypse/ and https://alexanderwales.com/addendum-to-the-ai-art-apocalypse...

replies(2): >>london+4e >>rtkwe+Se
◧◩
18. Spivak+cc[view] [source] [discussion] 2022-10-16 23:37:45
>>paulgb+hb
It’s not a copyright violation to commission an artist to make you something in the style of another artist and it’s also not copyright infringement for the artist you hired to look at that artist’s work to learn what that style means. And it’s also not always infringement to draw another artist’s work in your own style same as reimplementing code.

If you “trace” another artists work the hammer comes down though. For Copilot it’s way easier to get it to obviously trace.

replies(1): >>rfrec0+NB
◧◩◪
19. matkon+gc[view] [source] [discussion] 2022-10-16 23:38:22
>>jzb+l9
https://alexanderwales.com/wp-content/uploads/2022/08/image....

Left: “Girl with a Pearl Earring, by Johannes Vermeer” by Stable Diffusion Right: Girl with a Pearl Earring by Johannes Vermeer

This specific one is not copyright violation as it is old enough for copyright to expire. But the same may happen with other images.

from https://alexanderwales.com/the-ai-art-apocalypse/ and https://alexanderwales.com/addendum-to-the-ai-art-apocalypse...

replies(1): >>Fillig+ml
◧◩◪◨
20. matkon+hc[view] [source] [discussion] 2022-10-16 23:38:53
>>omnimu+ma
style is not copyrightable under current rules
replies(1): >>omnimu+tN1
21. pavlov+Uc[view] [source] 2022-10-16 23:44:42
>>rtkwe+(OP)
Stable Diffusion sometimes reproduces the large watermarks used by stock photo providers on their free sample images. That’s embarrassing at the minimum, and potentially a trademark violation.
replies(1): >>bigiai+Ij
◧◩◪◨⬒⬓
22. datafl+1d[view] [source] [discussion] 2022-10-16 23:45:14
>>Americ+6c
That code block is neither "simple like FizzBuzz" nor is it in a lawsuit. I feel like we're speaking past each other at this point.
replies(1): >>Americ+Yd
◧◩◪◨⬒⬓⬔
23. Americ+Yd[view] [source] [discussion] 2022-10-16 23:53:17
>>datafl+1d
What makes it not simple like FizzBuzz? You will not be able to come up with a reason why this one single function is copyrightable, but a FizzBuzz function isn’t. It’s one function in 15 lines of code. Get 1,000,000 developers to implement that function and you’re not going to have anywhere near 1,000,000 substantially different implementations.
replies(2): >>datafl+Ge >>monoca+lG3
◧◩◪
24. london+4e[view] [source] [discussion] 2022-10-16 23:54:39
>>matkon+bc
I think this happens a lot with famous images since that image will be in the training set hundreds of times.

Even if deduplication efforts are done, that painting will still be in the background of movie shots etc.

25. DannyB+ve[view] [source] 2022-10-16 23:57:50
>>rtkwe+(OP)
Well no.

Code is only protected to the degree it is creative and not functionally driven anyway.

So the reduced band of possible expression often directly reduces the protectability-through-copyright.

◧◩◪◨⬒⬓⬔⧯
26. datafl+Ge[view] [source] [discussion] 2022-10-16 23:58:48
>>Americ+Yd
For one thing FizzBuzz is like... 5-6 statements? This function has 13. FizzBuzz has a whopping 1 variable to keep track of. This function has so many I'm not even going to try to count. I'm not going to keep arguing about this, but if you want to believe they're equally simple then you'll just have a hard time convincing other people. That's all I have left to say on this.
replies(2): >>CapsAd+Xj >>SAI_Pe+1r
◧◩◪
27. rtkwe+Se[view] [source] [discussion] 2022-10-17 00:00:02
>>matkon+bc
> Left: “Girl with a Pearl Earring, by Johannes Vermeer” by Stable Diffusion Right: Girl with a Pearl Earring by Johannes Vermeer

Even that if done by a person as far as I understand it would not constitute a copyright infringement. It's a separate work mimicking Vermeer's original. The closest real world equivalent I can think of is probably the Obama Hope case by AP vs Shepard Fairy but that settled out of court so we don't really know what the status of that kind of reproduction is legally. On top of that though the SD image isn't just a recoloring with some additions like Fairy's was so it's not quite as close to the original as that case is.

replies(2): >>blende+RA >>matkon+5r5
◧◩
28. rtkwe+gf[view] [source] [discussion] 2022-10-17 00:02:16
>>mr_toa+49
I'm dubious of that in cases where the training set isn't distributed. If we call the training copyright infringement is downloading an image infringement? is caching?
replies(1): >>didibu+4s
◧◩◪
29. sidewn+5g[view] [source] [discussion] 2022-10-17 00:08:48
>>omnimu+X9
Does dalle-2 verbatim reproduce artwork? I have never used it.
replies(1): >>CapsAd+Ri
◧◩◪◨
30. CapsAd+Ri[view] [source] [discussion] 2022-10-17 00:39:31
>>sidewn+5g
It's kind of like having millions of parameters you can tweak to get to an image. So an image does not really exist in the model.

I can imagine Mona Lisa in my head, but it doesn't really "exist" verbatim in my head. It's only an approximation.

I believe copilot works the same way (?)

replies(2): >>heavys+8k >>hacker+9s
◧◩◪◨⬒
31. numpad+lj[view] [source] [discussion] 2022-10-17 00:43:18
>>lbotos+7c
Features outside the face is lost/changed from original on the right, so can’t be face inpainting. Unlikely to be style transfers, because some body parts are moved. Most plausibly this was generated.

So much for “generation” - it seems as if these models are just overfitting on extremely small subset of input data that it did not utterly failed to train on, almost that there could be geniuses who would be able to directly generate weight data from said images without all the gradient descent thing.

32. bigiai+oj[view] [source] 2022-10-17 00:43:32
>>rtkwe+(OP)
> With stable diffusion it's much closer to someone remixing and aping the images than it is reproducing originals.

So very similar to how the music industry treats sampling then?

Everybody using CoPilot needs to get "code sample clearance" from the original copyright holder before publishing their remix or new program that uses snippets of somebody else's code...

Try explaining _that_ to your boss and legal department.

"To: <all software dev> Effective immediately, any use of Github is forbidden without prior written approval from both the CTO and General Councel."

replies(1): >>kmeist+up
◧◩
33. bigiai+Ij[view] [source] [discussion] 2022-10-17 00:46:44
>>pavlov+Uc
Surely at the very least it'd be a TOS violation? I doubt any stock photo service grants you enough rights to redistribute their watermarked free image samples? Especially not in the context of a project like Stable Diffusion?
replies(1): >>Fillig+Yk
◧◩◪◨⬒⬓⬔⧯▣
34. CapsAd+Xj[view] [source] [discussion] 2022-10-17 00:48:16
>>datafl+Ge
It doesn't seem that far off to me. Copyright makes more sense in a larger context, such as making a Windows clone by copy pasting code from some Windows leak.

Without that context, fizzbuzz is not that different from a matrix transpose function to me.

◧◩◪◨⬒
35. heavys+8k[view] [source] [discussion] 2022-10-17 00:51:09
>>CapsAd+Ri
NNs can and do encode information from their training sets in the models themselves, sometimes verbatim.

Sometimes the original information is there in the model, encoded/compressed/however you want to look at it, and can be reproduced.

◧◩
36. reacha+wk[view] [source] [discussion] 2022-10-17 00:55:12
>>Americ+ca
Copyright is for original whole works. Utility functions don’t fall under that I don’t think.

I suppose whoever wants to pay the fees would “own” these things ?

https://www.copyright.gov/circs/circ61.pdf

◧◩◪
37. Fillig+Yk[view] [source] [discussion] 2022-10-17 01:00:06
>>bigiai+Ij
But it's not reproducing their samples. It's just adding their watermark to newly generated pictures you can't find in the training set.
replies(3): >>rovr13+no >>dougab+Is >>dragon+3w
◧◩◪◨
38. Fillig+ml[view] [source] [discussion] 2022-10-17 01:02:46
>>matkon+gc
If a human drew that, it would not be copyright violation.
replies(4): >>mattkr+mr >>Thorre+yt >>makeit+Tw >>matkon+pr5
◧◩◪◨
39. rovr13+no[view] [source] [discussion] 2022-10-17 01:35:40
>>Fillig+Yk
If the watermark is their logo or name, it could copyrighted or trademarked.
replies(1): >>nl+fr
40. kmeist+cp[view] [source] 2022-10-17 01:44:28
>>rtkwe+(OP)
The reason why it's easy to match Copilot results back to the original source is that the users are starting with prompts that match their public code, deliberately to cause prompt regurgitation.

Stable Diffusion actually has a similar problem. Certain terms that directly call up a particular famous painting by name - say, the Mona Lisa[0] - will just produce that painting, possibly tiled on top of itself, and it won't bother with any of the other keywords or phrases you throw at it.

The underlying problem is that the AI just outright forgets that it's supposed to create novel works when you give it anything resembling the training set data. If it was just that the AI could spit out training set data when you ask for it, I wouldn't be concerned[1], but this could also happen inadvertently. This would mean that anyone using Copilot to write production code would be risking copyright liability. Through the AI they have access to the entire training set, and the AI has a habit of accidentally producing output that's substantially similar to it. Those are the two prongs of a copyright infringement claim right there.

[0] For the record I was trying to get it to draw a picture of the Mona Lisa slapping Yoshikage Kira across the cheek

[1] Anyone using an AI system to "launder" creative works is still infringing copyright. AI does not carve a shiny new loophole in the GPL.

replies(4): >>xani_+Ps >>thetea+Gu >>joe-co+Pu >>llimll+Bv
◧◩
41. kmeist+up[view] [source] [discussion] 2022-10-17 01:46:49
>>bigiai+oj
This is already a problem with anyone who ever copypastes from Stack Overflow. You're all violating CC-BY-SA[0] and nobody really cares about this.

[0] https://stackoverflow.com/help/licensing

replies(1): >>bscphi+lt
◧◩◪◨⬒⬓⬔⧯▣
42. SAI_Pe+1r[view] [source] [discussion] 2022-10-17 02:00:43
>>datafl+Ge
SCO v. IBM[1] included claims of sections as small as "…ranging from five to ten to fifteen lines of code in multiple places that are of issue…" in some of the individual claims of the case.

[1] https://en.wikipedia.org/wiki/SCO_Group,_Inc._v._Internation....

replies(1): >>datafl+tr
◧◩◪◨⬒
43. nl+fr[view] [source] [discussion] 2022-10-17 02:02:07
>>rovr13+no
And it's the responsibility of the person using the tool to generate that image not to violate copyright by redistributing it.
replies(2): >>behrin+Zt >>MereIn+mu
◧◩◪◨⬒
44. mattkr+mr[view] [source] [discussion] 2022-10-17 02:02:54
>>Fillig+ml
I’m not so sure about that.

The scenes à faire doctrine would certainly let you paint your own picture of a pretty girl with a large earring, even a pearl one. That, however, is definitely the same person, in the same pose/composition, in the same outfit. The colors are slightly off, but the difference feels like a technical error rather than an expressive choice.

replies(2): >>Thorre+ot >>boulos+vt
◧◩◪◨⬒⬓⬔⧯▣▦
45. datafl+tr[view] [source] [discussion] 2022-10-17 02:04:06
>>SAI_Pe+1r
The "..." part you redacted out explicitly said "it is many different sections of code". It was (quite obviously) not one or two 5-line blocks of code, let alone "simple" ones like FizzBuzz.
replies(1): >>Americ+uE
◧◩
46. nl+zr[view] [source] [discussion] 2022-10-17 02:04:26
>>paulgb+hb
> n those cases, it seems like not such a leap to say that the AI has obviously seen that artist’s work and that the output is a derivative work.

"Copying" a style is not a derivative work:

> Why isn't style protected by copyright? Well for one thing, there's some case law telling us it isn't. In Steinberg v. Columbia Pictures, the court stated that style is merely one ingredient of expression and for there to be infringement, there has to be substantial similarity between the original work and the new, purportedly infringing, work. In Dave Grossman Designs v. Bortin, the court said that:

> "The law of copyright is clear that only specific expressions of an idea may be copyrighted, that other parties may copy that idea, but that other parties may not copy that specific expression of the idea or portions thereof. For example, Picasso may be entitled to a copyright on his portrait of three women painted in his Cubist motif. Any artist, however, may paint a picture of any subject in the Cubist motif, including a portrait of three women, and not violate Picasso's copyright so long as the second artist does not substantially copy Picasso's specific expression of his idea."

https://www.thelegalartist.com/blog/you-cant-copyright-style

◧◩◪
47. didibu+4s[view] [source] [discussion] 2022-10-17 02:09:54
>>rtkwe+gf
I think it's more a question of derivative work. Normally derivative work is an infringement unless it falls under fair use.

Now a human can take inspiration from like 100 different sources and probably end up with something that no one would recognize as derivative to any of them. But it also wouldn't be obvious that the human did that.

But with an ML model, it's clearly a derivative in that the learned function is mathematically derived from its dataset and so is all the resulting outputs.

I think this brings a new question though. Because till now derivative was kind of implied that the output was recognizable as being derived.

With AI, you can tweak it so the output doesn't end up being easily recognizable as derived, but we know it's still derived.

Personally I think what really matters is more a question of what should be the legal framework around it. How do we balance the interests of AI companies and that of developers, artists, citizens who are the authors of the dataset that enabled the AI to exist. And what right should each party be given?

replies(1): >>rtkwe+VO1
◧◩◪◨⬒
48. hacker+9s[view] [source] [discussion] 2022-10-17 02:10:29
>>CapsAd+Ri
This is just nonsense.

It's similar to saying that any digital representation of an image isn't an image just a dataset that represent it.

If what you said was any sort of defense every image copyright would never apply to any digital image, because the images can be saved in different resolutions, different file formats, or encoded down. e.g. if a jpeg 'image' was only an image at an exact set of digital bits i could save it again with a different quality setting and end up with a different set of digital bits.

But everyone still recognises when an image looks the same, and courts will uphold copyright claims regardless of the digital encoding of an image. So goodluck with that spurious argument that it's not copyright because 'its on the internet (oh its with AI etc).

replies(1): >>CapsAd+Cv
◧◩◪◨
49. dougab+Is[view] [source] [discussion] 2022-10-17 02:15:24
>>Fillig+Yk
If it faithfully memorized and reproduced a set of watermarks, it would be premature to conclude that it hadn’t memorized other (non-generic) graphical elements.
◧◩
50. xani_+Ps[view] [source] [discussion] 2022-10-17 02:16:14
>>kmeist+cp
> The reason why it's easy to match Copilot results back to the original source is that the users are starting with prompts that match their public code, deliberately to cause prompt regurgitation.

The reason doesn't really matter...

replies(1): >>lofatd+Xu
◧◩
51. didibu+Xs[view] [source] [discussion] 2022-10-17 02:17:49
>>Americ+ca
I think the issue people have is that every developer trying to implement FizzBuzz will not have studied all the existing public copyrighted implementations. They will likely be reinventing the solution with maybe never having seen an existing FizzBuzz implementation or having only seen one or two at most, and probably won't be re-implementing it verbatim.

But the machine learning model has studied every single one of them.

And maybe more preposterous, if its dataset had no FizzBuzz implementation would it even be able to re-invent it?

I feel this is the big distinction that probably annoys people.

That and the general fact that everyone is worried it'll devalue the worth of an experienced developer as AI will make hard thing easier, require less effort and talent to learn and thus making developers less high demand and probably lower paid.

◧◩◪
52. bscphi+lt[view] [source] [discussion] 2022-10-17 02:22:49
>>kmeist+up
If I ever take any code from SO, I include a comment with a link to it. Surely that's standard practice for anything longer than a line or two?
replies(1): >>fourth+gE
◧◩◪◨⬒⬓
53. Thorre+ot[view] [source] [discussion] 2022-10-17 02:23:22
>>mattkr+mr
Even if it is an expressive choice of the new artist, if enough of the original artist's expressive choice remains, it could still be a copyright violation. Fair use can sometimes be a defense, but there are a lot of factors that go into determining whether something is fair use.
◧◩◪◨⬒⬓
54. boulos+vt[view] [source] [discussion] 2022-10-17 02:24:02
>>mattkr+mr
Really? It looks like some bad Warhol take on the Vermeer original.
replies(1): >>mattkr+ZD
◧◩◪◨⬒
55. Thorre+yt[view] [source] [discussion] 2022-10-17 02:25:24
>>Fillig+ml
Why? Obviously it wouldn't be a copyright violation because the original one is old enough to no longer by copyrighted. But other than age?
replies(1): >>atchoo+qL
◧◩◪◨⬒⬓
56. behrin+Zt[view] [source] [discussion] 2022-10-17 02:30:08
>>nl+fr
The tool is already redistributing it.

A broadcaster of copyrighted works is not protected against infringement just because they expect viewers to only watch programming they own.

replies(1): >>rtkwe+vs1
◧◩◪◨⬒⬓
57. MereIn+mu[view] [source] [discussion] 2022-10-17 02:34:53
>>nl+fr
Just like it's the person's responsibility to only recombine jpeg basis states when they don't correspond to a copyrighted image? It seems more and more to be the case that the trained model is, in large part, a very compact representation of the training data. I'm not seeing a difference between distributing a model that can be used to reconstruct the input images, as opposed to distributing jpeg basis states that can be used to reconstruct the original image.
◧◩
58. thetea+Gu[view] [source] [discussion] 2022-10-17 02:38:03
>>kmeist+cp
> The reason why it's easy to match Copilot results back to the original source is that the users are starting with prompts that match their public code, deliberately to cause prompt regurgitation.

Sounds like MS has devised a massive automated code laundering racket.

replies(1): >>ISL+Nw
◧◩
59. joe-co+Pu[view] [source] [discussion] 2022-10-17 02:39:45
>>kmeist+cp
I think that's backwards. The AI doesn't "forget", it never even knew what novelty is in the first place.
◧◩◪
60. lofatd+Xu[view] [source] [discussion] 2022-10-17 02:41:10
>>xani_+Ps
GP is just highlighting why this is so common and often a challenging edge case. If you ask it for something that's exactly in its dataset, the "best" solution that minimizes loss will be that existing code. Thus, it's somewhat intrinsic to applying statistical learning to text completion.

This means MS really shouldn't have used copyleft code at all, and really shouldn't be selling copilot in this state, but "luckily" for them, short of a class action suit I don't really see any recourse for the programmers who's work they're reselling.

replies(2): >>fweime+xD >>kmeist+cU2
◧◩
61. llimll+Bv[view] [source] [discussion] 2022-10-17 02:49:33
>>kmeist+cp
I tried some very simple queries with copilot on random stuff, and tried to trace it back to the source. I was successful about 1/3 of the time.

(Sorry I didn't log my experiment results at the time. None of it was related to work I'd done - I used time adjustment functions if I remember correctly)

◧◩◪◨⬒⬓
62. CapsAd+Cv[view] [source] [discussion] 2022-10-17 02:49:37
>>hacker+9s
I don't understand what is nonsense, how it works? Your response seems to be for something entirely different.

But anyway, how I see stable diffusion being different is that it's a tool to generate all sorts of images, including copyrighted images.

It's more like a database of *how to* generate images rather than a database *of* images. Maybe there isn't that much of a difference when it comes to copyright law. If you ask an artist to draw a copyrighted image for you, who should be in trouble? I'd say the person asking most of the time, but in this case we argue it's the people behind the pencil or whatever. Why? Because it's too easy? Where does a service like fiver stand here?

So if a tool is able to generate something that looks indistinguishable from some copyrighted artwork, is it infringing on copyright? I can get on board with yes if it was trained on that copyrighted artwork, but otherwise I'm not so sure.

replies(1): >>rfrec0+IC
◧◩◪◨
63. dragon+3w[view] [source] [discussion] 2022-10-17 02:55:05
>>Fillig+Yk
The watermark of a stock photo service is usually copyright protected, and also a (usually registered) trademark.
◧◩◪
64. ISL+Nw[view] [source] [discussion] 2022-10-17 03:06:23
>>thetea+Gu
Seems more like a massive class-action copyright target, potentially at ($50k/infraction) x (the number of usages).
replies(2): >>bugfix+VE >>thetea+U11
◧◩◪◨⬒
65. makeit+Tw[view] [source] [discussion] 2022-10-17 03:08:20
>>Fillig+ml
If the original art is still copyrighted, and you’d start selling your hand drawn variation, you’d totally be violating the copyright.

To make it concrete, imagine the latest Disney movie poster. You redraw it 95% close to the original, just changing the actual title. Then you sell your poster on Amazon at half the price of the actual poster. Would you get a copyright strike ?

66. dv_dt+vA[view] [source] 2022-10-17 04:01:25
>>rtkwe+(OP)
I suspect it’s going to be a discussion similar to the introduction of music sampling, followed by a lot of litigation, followed by a settling of law on the matter.

The interesting part is if AI will be considered a tooling mechanism much like the tooling used to record and manipulate a music sample into a new composition.

◧◩◪◨
67. blende+RA[view] [source] [discussion] 2022-10-17 04:06:56
>>rtkwe+Se
Have you been following the Andy Warhol Prince drawing case?

It is current at the SCOTUS so we should see a ruling for the USA sometime in the next year or so.

https://en.m.wikipedia.org/wiki/Andy_Warhol_Foundation_for_t...

replies(1): >>rtkwe+mu1
◧◩◪
68. rfrec0+NB[view] [source] [discussion] 2022-10-17 04:20:45
>>Spivak+cc
Right, but what if you commission an artist to create a work similar to an already existing piece of art and the artist decides that the most efficient way to do that is to just place the original piece of art in a photocopier, crops out the copyright notice and original artist's signature, and sells you the resulting print?
replies(1): >>rtkwe+1B1
◧◩◪◨⬒⬓⬔
69. rfrec0+IC[view] [source] [discussion] 2022-10-17 04:34:10
>>CapsAd+Cv
A tool can't be held accountable and can't infringe on copyright or any other law for that matter. It's more of a product. It seems to me like it's a gray area that's just going to have to be decided in court. Like did the company that sells the tool that can very easily be used to do illegal things take enough reasonable measures to prevent it from being accidently used in such a way? In the case of Copilot, I don't believe so, because there aren't really even any adequate warnings to the end user that say it can produce code which can only legally be used in software that meets the criteria of the original license.
replies(2): >>omnimu+851 >>sidewn+kE1
◧◩◪◨
70. fweime+xD[view] [source] [discussion] 2022-10-17 04:44:42
>>lofatd+Xu
Pretty much all code they have requires attribution, and based on reports, Copilot does not generate that along with the code. So excluding copyleft code (how would you even do that?) does not address the issue (assuming that the source code produced is actually a derivative work).
replies(1): >>lofatd+iH
◧◩◪◨⬒⬓⬔
71. mattkr+ZD[view] [source] [discussion] 2022-10-17 04:50:17
>>boulos+vt
That’s a really apt comparison, since the Supreme Court just heard Andy Warhol Foundation for the Visual Arts v. Goldsmith, which hinges on whether Warhol’s use of a copyrighted photo of Prince as the basis for “Orange Prince” was Fair Use.

Warhol’s estate seems likely to lose and their strongest argument is that Warhol took a documentary photo and transformed it into a commentary on celebrity culture. Here, I don’t even see that applying: it just looks like a bad copy.

https://www.scotusblog.com/2022/10/justices-debate-whether-w...

◧◩◪◨
72. fourth+gE[view] [source] [discussion] 2022-10-17 04:55:14
>>bscphi+lt
I do the same. I think it satisfies BY (attribution) but not SA (Share Alike).

As GP says, no one really cares, but it seems hard to satisfy SA... even if you are pasting into open source, is your license compatible with CC?

Perhaps I'm over-thinking this.

◧◩◪◨⬒⬓⬔⧯▣▦▧
73. Americ+uE[view] [source] [discussion] 2022-10-17 04:57:16
>>datafl+tr
So your claim is that the code in the OP tweet is actually not copyrightable, and it would only become a copyright violation if you also copied many additional code blocks from the same copyrighted work?
◧◩◪◨
74. bugfix+VE[view] [source] [discussion] 2022-10-17 05:03:53
>>ISL+Nw
Good. Where do I sign up?
replies(1): >>ISL+wx3
◧◩◪◨⬒
75. lofatd+iH[view] [source] [discussion] 2022-10-17 05:46:08
>>fweime+xD
That's a good point. I was thinking that during the curation phase of the dataset they should check for a LICENSE.txt file in the repo, and just batch exclude all copyleft/copyright containing repositories. This obviously won't handle every case as you say, and when it does generate copyleft code it will fail to attribute, but hopefully not having copyleft code in its dataset or less of it reduces the chance it generates code that perfectly satisfies its loss function by being exactly like something its seen before.

The main problem I see with generating attribution is that the algorithm obviously doesn't "know" that it's generating identical code. Even in the original twitter post, the algorithm makes subtle and essentially semantically synonymous changes (like the changing the commenting style). So for all intents and purposes it can't attribute the function because it doesn't know _where_ it's coming from and copied code is indistinguishable from de novo code. Copilot will probably never be able to attribute code short of exhaustively checking the outputs using some symbolical approach against a database of copyleft/copyrighted code.

◧◩◪◨⬒⬓
76. atchoo+qL[view] [source] [discussion] 2022-10-17 06:32:28
>>Thorre+yt
The photograph of the art, which will be more recent, might have copyright protections.

It looks like it wouldn't in the UK, probably wouldn't in the US but would in Germany. The cases seem to hinge on the level of intellectual creativity of the photograph involved. The UK said that trying to create an exact copy was not an original endeavour whereas Germany said the task of exact replication requires intellectual/technical effort of it's own merit.

https://www.theipmatters.com/post/are-photographs-of-public-...

◧◩◪◨
77. thetea+U11[view] [source] [discussion] 2022-10-17 09:29:37
>>ISL+Nw
Both.
◧◩◪◨⬒⬓⬔⧯
78. omnimu+851[view] [source] [discussion] 2022-10-17 10:12:54
>>rfrec0+IC
The issue is not about what it produces. Copilot i am sure has safeguards to not output copyrighted code (they even mention they have tests). So it will sufficiently change the code to be legally safe.

The issue is in how it creates the output. Both Dalle and Copilot can work only by taking work of people in past, sucking up their earned know how and creations and remixing it. All that while not crediting (or paying) anyone. The software itself might be great but it only works because it was fed with loads of quality material.

It's smart copy&paste with obfuscation. If thats ok legally. You can imagine soon it could be used to rewrite whole codebases while avoiding any copyright. All the code will technically be different, but also the same.

◧◩◪◨⬒⬓⬔
79. rtkwe+vs1[view] [source] [discussion] 2022-10-17 13:18:50
>>behrin+Zt
It's not broadcasting an exact replica though, it's instructions to recreate an approximation of the original image. If I look at an image describe it and have someone else or even myself recreate it later that in general isn't copyright infringement, that's just a normal process in art. A more extreme example is the Fairy Hope image and the original AP but even that is more similar to the original than the output created by stable diffusion. Approximate recreations aren't generally copyright violations.

On the subject of trademarks the issue is as far as I know even more on the end user because the protections on them is around use in commerce and consumer confusion not about just recreating them like copyright protections.

◧◩◪◨⬒
80. rtkwe+mu1[view] [source] [discussion] 2022-10-17 13:28:04
>>blende+RA
No hadn't heard of it, I don't follow copyright law extremely closely it tends to make me annoyed. On it's face reading the case summaries and looking at the two pictures, it feels like the act of manually repainting and the color choices should be enough to render it a transformative work. It's one of the fundamental problems with trying to apply copyright to anything other than precise copies, art remixes and recombines all the time, it's fundamental to the process.
◧◩◪◨
81. rtkwe+Uv1[view] [source] [discussion] 2022-10-17 13:36:28
>>keving+ua
That's clearly lifting style, pose and general location but in each of those there are changes. Even for the original art we could find tons of examples of very similar poses and backgrounds because anime girl in a bathing suit on a beach background isn't that original of an image at the concept level. That pose also is a pretty well worn.

This is the problem of applying the idea of ownership to ideas and expression like art. Art in particular is a very remix and recombination driven field.

replies(1): >>keving+Ux4
◧◩◪◨
82. rtkwe+1B1[view] [source] [discussion] 2022-10-17 13:58:49
>>rfrec0+NB
That's a violation but not what SD is doing. It's not copying it's recreating a similar (sometimes extremely similar image).
◧◩◪◨⬒⬓⬔⧯
83. sidewn+kE1[view] [source] [discussion] 2022-10-17 14:15:57
>>rfrec0+IC
The DMCA disagrees. Specific methods of "circumvention" which inevitably take the form of a software tool are prohibited. Tools and their authors can be held accountable.
◧◩◪◨⬒
84. omnimu+tN1[view] [source] [discussion] 2022-10-17 14:55:23
>>matkon+hc
But it means the models were trained on images that are under copyright. In fact many of these models were trained exclusively on such images without any permission. For example Midjourney is clearly trained on everything on artstation.com where almost all images have commercial purpose / licenses.
◧◩◪◨
85. rtkwe+VO1[view] [source] [discussion] 2022-10-17 15:01:02
>>didibu+4s
The real kink in that application of derivative work to me is the entire dataset goes into the model and is to some vanishingly small extent is used in every output how can we meaningfully assign ownership through that transition and mixing. And when we do how do we do it without exacerbating the extant problem of copyright in art? We already can't use characters and settings made during out own lifetimes in our own expression because Disney got life + 70 through Congress.
◧◩◪◨
86. kmeist+cU2[view] [source] [discussion] 2022-10-17 19:47:00
>>lofatd+Xu
Suing Microsoft for training Copilot on your code would require jumping over the same hurdle that the Authors Guild could not: i.e. that it is fair use to scan a massive corpus[0] of texts (or images) in order to search through them.

My real worry is downstream infringement risk, since fair use is non-transitive. Microsoft can legally provide you a code generator AI, but you cannot legally use regurgitated training set output[1]. GitHub Copilot is creating all sorts of opportunities to put your project in legal jeopardy and Microsoft is being kind of irresponsible with how they market it.

[0] Note that we're assuming published work. Doing the exact same thing Microsoft did, but on unpublished work (say, for irony's sake, the NT kernel source code) might actually not be fair use.

[1] This may give rise to some novel inducement claims, but the irony of anyone in the FOSS community relying on MGM v. Grokster to enforce the GPL is palpable.

◧◩◪◨⬒
87. ISL+wx3[view] [source] [discussion] 2022-10-17 23:39:50
>>bugfix+VE
Find a good and ambitious copyright attorney with some free capacity.

Also, register your code with the copyright office.

Edit: Apparently, with the #1 post on HN right now, you could also just go here: https://githubcopilotinvestigation.com/

◧◩◪◨⬒⬓⬔⧯
88. monoca+lG3[view] [source] [discussion] 2022-10-18 00:44:29
>>Americ+Yd
Google v. Oracle ended with a six line function not being granted de minimus protection. What you're talking about is arguably common sense, but not based on current case law in the US.
◧◩◪◨⬒
89. keving+Ux4[view] [source] [discussion] 2022-10-18 09:39:23
>>rtkwe+Uv1
I think the key detail is to look at what happened in the bottom left - in the original drawing, there's dark blue (due to lighting) cloth filling the scene, but the network has instead generated oddly-hued water there, even though on the right side there's sand from the beach shore. There's seemingly no geometric representation driving the AI so it ended up turning clothing into mystery ocean water when synthesizing an image that (for whatever reason) looked like the original one. It's an interesting error to me because it only looks Wrong once you notice the sand on the right.
◧◩◪◨
90. matkon+5r5[view] [source] [discussion] 2022-10-18 15:04:21
>>rtkwe+Se
It is a clear case of derivative work (see also https://commons.wikimedia.org/wiki/Commons:Derivative_works - internal docs, but their explanation of copyright status tends to be well done)
◧◩◪◨⬒
91. matkon+pr5[view] [source] [discussion] 2022-10-18 15:05:16
>>Fillig+ml
It is a clear case of derivative work (see also https://commons.wikimedia.org/wiki/Commons:Derivative_works - internal docs, but their explanation of copyright status tends to be well done)

This specific one would not be a problem, but doing it with a still copyrighted work would be.

[go to top]