zlacker

[return to "GitHub Copilot, with “public code” blocked, emits my copyrighted code"]
1. kweing+v6[view] [source] 2022-10-16 20:27:21
>>davidg+(OP)
I’ve noticed that people tend to disapprove of AI trained on their profession’s data, but are usually indifferent or positive about other applications of AI.

For example, I know artists who are vehemently against DALL-E, Stable Diffusion, etc. and regard it as stealing, but they view Copilot and GPT-3 as merely useful tools. I also know software devs who are extremely excited about AI art and GPT-3 but are outraged by Copilot.

For myself, I am skeptical of intellectual property in the first place. I say go for it.

◧◩
2. bayind+Fd[view] [source] 2022-10-16 21:31:39
>>kweing+v6
I, with my software developer hat, am not excited by AI. Not a bit, honestly. Esp. about these big models trained on huge amount of data, without any consent.

Let me be perfectly clear. I'm all for the tech. The capabilities are nice. The thing I'm strongly against is training these models on any data without any consent.

GPT-3 is OK, training it with public stuff regardless of its license is not.

Copilot is OK, training on with GPL/LGPL licensed code without consent is not.

DALL-E/MidJourney/Stable Diffusion is OK. Training it with non public domain or CC0 images is not.

"We're doing something amazing, hence we need no permission" is ugly to put it very lightly.

I've left GitHub because of CoPilot. Will leave any photo hosting platform if they hint any similar thing with my photography, period.

◧◩◪
3. psychp+6g[view] [source] 2022-10-16 21:56:19
>>bayind+Fd
I disagree.

Those are effectively cases of cryptomnesia[0]. Part and parcel of learning.

If you don't want broad access your work, don't upload it to a public repository. It's very simple. Good on you for recognising that you don't agree with what GitHub looks at data in public repos, but it's not their problem.

[0] https://en.m.wikipedia.org/wiki/Cryptomnesia

◧◩◪◨
4. bayind+Cl[view] [source] 2022-10-16 22:45:59
>>psychp+6g
> Those are effectively cases of cryptomnesia.

Disagree, outputting training data as-is is not cryptomnesia. This is not Copilot's first case. It also reproduced ID software's fast inverse square root function as-is, including its comments, but without its license.

> If you don't want broad access your work, don't upload it to a public repository. It's very simple.

This is actually both funny and absurd. This is why we have licenses at this point. If all the licenses is moot, then this opens a very big can of worms...

My terms are simple. If you derive, share the derivation with the same license (xGPL). Copilot is deriving my code. If you use my code as a derivation point, honor the license, mark the derivation with GPL license. This voids your business case? I don't care. These are my terms.

If any public item can be used without any limitations, Getty Images (or any other stock photo business) is illegal. CC licensing shouldn't exist. GPL is moot. Even the most litigious software companies' cases (Oracle, SCO, Microsoft, Adobe, etc.) is moot. Just don't put it on public servers, eh?

Similarly, music and other fine arts are generally publicly accessible. So copyright on any and every production is also invalid as you say, because it's publicly available.

Why not put your case forward with attorneys of Disney, WB, Netflix and others? I'm sure they'll provide all their archives for training your video/image AI. Similarly Microsoft, Adobe, Mathworks, et al. will be thrilled to support your CoPilot competitor with their code, because a) Any similar code will be just cryptomnesia, b) The software produced from that code is publicly accessible anyway.

At this point, I even didn't touch to the fact that humans are trained much more differently than neural networks.

◧◩◪◨⬒
5. psychp+6o[view] [source] 2022-10-16 23:08:30
>>bayind+Cl
It's funny to say id's fast inverse square root. Conway certainly didn't come up with the algorithm or the magic number.

But your reasoning boils down to I don't like it so it mustn't be that way. That's never been necessarily true.

At any rate piracy is rampant so clearly a large body of people don't think even a direct copies is morally wrong. Let alone something similar.

You're acting as though there are constant won and lost cases over plagiarism. Ed Sheeran seems to defend his work weekly. Every case that goes to court means reasonable minds differ on the interpretation of plagiarism legally.

So what's your point?

Because it seems the main thrust of your argument is I should argue with Microsoft instead (*who own GitHub lol*)? That's all you got to hold back the tide of AI? An appeal to authority?

◧◩◪◨⬒⬓
6. bayind+x21[view] [source] 2022-10-17 06:59:05
>>psychp+6o
> It's funny to say id's fast inverse square root. Conway certainly didn't come up with the algorithm or the magic number.

I'm not claiming that they did. What I said is, Copilot emitted the exact implementation in IDs repository, incl. all comments and everything.

> But your reasoning boils down to I don't like it so it mustn't be that way. That's never been necessarily true.

If you interpret my comment with that depth and breadth, I can only say that you are misinterpreting completely. It's not about my personal tastes, it's about ethical frameworks and social contracts.

> At any rate piracy is rampant so clearly a large body of people don't think even a direct copies is morally wrong. Let alone something similar.

I believe if you listen to a street musician for a minute, you owe them a dollar. Scale up from there. BTW, I'm a former orchestra player, so I know what making and performing music entails.

> You're acting as though there are constant won and lost cases over plagiarism. Ed Sheeran seems to defend his work weekly. Every case that goes to court means reasonable minds differ on the interpretation of plagiarism legally.

When there's a strict license on how a work can be used, and the license is violated, it's a clear case. That AI is just a derivation engine, and the license that derivations carry the same license. I don't care if you derive my code. I care you derive my code and hide the derivations from public.

It's funny that you're defending close-souring free software at this point. This is a neat full-circle.

> So what's your point?

All research and science should be ethical. AI research is not something special which allows these ethical norms and guidelines (which are established over decades if not centuries) to be suspended. If medicine people act with quarter of this lassiez faire attitude, they'd be executed with a slow death. If security researchers act with eighth of this recklessness, their career are ruined.

> That's all you got to hold back the tide of AI?

As I aforementioned, I'm not against AI. It just doesn't excite me as a person who knows how it works and what it does, and the researchers' attitude is leaving a bad taste in my mouth.

[go to top]