GitHub Copilot, with “public code” blocked, emits my copyrighted code

>>davidg+(OP)
I’ve noticed that people tend to disapprove of AI trained on their profession’s data, but are usually indifferent or positive about other applications of AI.

For example, I know artists who are vehemently against DALL-E, Stable Diffusion, etc. and regard it as stealing, but they view Copilot and GPT-3 as merely useful tools. I also know software devs who are extremely excited about AI art and GPT-3 but are outraged by Copilot.

For myself, I am skeptical of intellectual property in the first place. I say go for it.

>>kweing+v6
When Joe Rando plays a song from 1640 on a violin he gets a copyright claim on Youtube. When Jane Rando uses devtools to check a website source code she gets sued.

When Microsoft steals all code on their platform and sells it, they get lauded. When "Open" AI steals thousands of copyrighted images and sells them, they get lauded.

I am skeptical of imaginary property myself, but fuck this one set of rules for the poor, another set of rules for the masses.

>>tpxl+O7
I think copilot is a clearer copyright violation than any of the stable diffusion projects though because code has a much narrower band of expression than images. It's really easy to look at the output of CoPilot and match it back to the original source and say these are the same. With stable diffusion it's much closer to someone remixing and aping the images than it is reproducing originals.

I haven't been following super closely but I don't know of any claims or examples where input images were recreated to a significant degree by stable diffusion.

>>rtkwe+Te
> I haven't been following super closely but I don't know of any claims or examples where input images were recreated to a significant degree by stable diffusion.

I think that the argument being made by some artists is that the training process itself violates copyright just by using the training data.

That’s quite different from arguing that the output violates copyright, which is what the tweet in this case was about.

>>mr_toa+Xn
I'm dubious of that in cases where the training set isn't distributed. If we call the training copyright infringement is downloading an image infringement? is caching?

>>rtkwe+9u
I think it's more a question of derivative work. Normally derivative work is an infringement unless it falls under fair use.

Now a human can take inspiration from like 100 different sources and probably end up with something that no one would recognize as derivative to any of them. But it also wouldn't be obvious that the human did that.

But with an ML model, it's clearly a derivative in that the learned function is mathematically derived from its dataset and so is all the resulting outputs.

I think this brings a new question though. Because till now derivative was kind of implied that the output was recognizable as being derived.

With AI, you can tweak it so the output doesn't end up being easily recognizable as derived, but we know it's still derived.

Personally I think what really matters is more a question of what should be the legal framework around it. How do we balance the interests of AI companies and that of developers, artists, citizens who are the authors of the dataset that enabled the AI to exist. And what right should each party be given?

>>didibu+XG
The real kink in that application of derivative work to me is the entire dataset goes into the model and is to some vanishingly small extent is used in every output how can we meaningfully assign ownership through that transition and mixing. And when we do how do we do it without exacerbating the extant problem of copyright in art? We already can't use characters and settings made during out own lifetimes in our own expression because Disney got life + 70 through Congress.

zlacker