Let me be perfectly clear. I'm all for the tech. The capabilities are nice. The thing I'm strongly against is training these models on any data without any consent.
GPT-3 is OK, training it with public stuff regardless of its license is not.
Copilot is OK, training on with GPL/LGPL licensed code without consent is not.
DALL-E/MidJourney/Stable Diffusion is OK. Training it with non public domain or CC0 images is not.
"We're doing something amazing, hence we need no permission" is ugly to put it very lightly.
I've left GitHub because of CoPilot. Will leave any photo hosting platform if they hint any similar thing with my photography, period.
Those are effectively cases of cryptomnesia[0]. Part and parcel of learning.
If you don't want broad access your work, don't upload it to a public repository. It's very simple. Good on you for recognising that you don't agree with what GitHub looks at data in public repos, but it's not their problem.
Disagree, outputting training data as-is is not cryptomnesia. This is not Copilot's first case. It also reproduced ID software's fast inverse square root function as-is, including its comments, but without its license.
> If you don't want broad access your work, don't upload it to a public repository. It's very simple.
This is actually both funny and absurd. This is why we have licenses at this point. If all the licenses is moot, then this opens a very big can of worms...
My terms are simple. If you derive, share the derivation with the same license (xGPL). Copilot is deriving my code. If you use my code as a derivation point, honor the license, mark the derivation with GPL license. This voids your business case? I don't care. These are my terms.
If any public item can be used without any limitations, Getty Images (or any other stock photo business) is illegal. CC licensing shouldn't exist. GPL is moot. Even the most litigious software companies' cases (Oracle, SCO, Microsoft, Adobe, etc.) is moot. Just don't put it on public servers, eh?
Similarly, music and other fine arts are generally publicly accessible. So copyright on any and every production is also invalid as you say, because it's publicly available.
Why not put your case forward with attorneys of Disney, WB, Netflix and others? I'm sure they'll provide all their archives for training your video/image AI. Similarly Microsoft, Adobe, Mathworks, et al. will be thrilled to support your CoPilot competitor with their code, because a) Any similar code will be just cryptomnesia, b) The software produced from that code is publicly accessible anyway.
At this point, I even didn't touch to the fact that humans are trained much more differently than neural networks.
Outputting training data as-is without attribution is just plain plagiarism. You don't get to put verbatim text from textbooks in your academic papers either.
But your reasoning boils down to I don't like it so it mustn't be that way. That's never been necessarily true.
At any rate piracy is rampant so clearly a large body of people don't think even a direct copies is morally wrong. Let alone something similar.
You're acting as though there are constant won and lost cases over plagiarism. Ed Sheeran seems to defend his work weekly. Every case that goes to court means reasonable minds differ on the interpretation of plagiarism legally.
So what's your point?
Because it seems the main thrust of your argument is I should argue with Microsoft instead (*who own GitHub lol*)? That's all you got to hold back the tide of AI? An appeal to authority?
That’s actually fine (kind of the idea of specifying a license). What is not fine is using that code in non-GPL licensed code.
We are talking ‘de facto’ here, not ‘de jure’. It may be legally problematic, but anything made public once is never going back in the box.
That sounds like a you problem, not a us problem.
As of yet, no court has said that any of this is illegal.
So tough luck. Go take it to the supreme court if you disagree, because right now it actually seems like people can do almost whatever they want with these AI tools.
Your objection simply doesn't matter, until there is a court case that supports you. You can't do anything about it, if that doesn't happen.
This stance allows me to do whatever do I want with any software or work you put out there, regardless of the license you attach to it, since it's your problem, not mine.
However, this is not the mode I operate ethically.
> As of yet, no court has said that any of this is illegal.
I assume this will be tested somehow, sometime. So I'm investing in popcorn futures.
> Your objection simply doesn't matter, until there is a court case that supports you. You can't do anything about it, if that doesn't happen.
You know, this goes both ways. Same will be very valid for your works, through your own reasoning.
I'm not claiming that they did. What I said is, Copilot emitted the exact implementation in IDs repository, incl. all comments and everything.
> But your reasoning boils down to I don't like it so it mustn't be that way. That's never been necessarily true.
If you interpret my comment with that depth and breadth, I can only say that you are misinterpreting completely. It's not about my personal tastes, it's about ethical frameworks and social contracts.
> At any rate piracy is rampant so clearly a large body of people don't think even a direct copies is morally wrong. Let alone something similar.
I believe if you listen to a street musician for a minute, you owe them a dollar. Scale up from there. BTW, I'm a former orchestra player, so I know what making and performing music entails.
> You're acting as though there are constant won and lost cases over plagiarism. Ed Sheeran seems to defend his work weekly. Every case that goes to court means reasonable minds differ on the interpretation of plagiarism legally.
When there's a strict license on how a work can be used, and the license is violated, it's a clear case. That AI is just a derivation engine, and the license that derivations carry the same license. I don't care if you derive my code. I care you derive my code and hide the derivations from public.
It's funny that you're defending close-souring free software at this point. This is a neat full-circle.
> So what's your point?
All research and science should be ethical. AI research is not something special which allows these ethical norms and guidelines (which are established over decades if not centuries) to be suspended. If medicine people act with quarter of this lassiez faire attitude, they'd be executed with a slow death. If security researchers act with eighth of this recklessness, their career are ruined.
> That's all you got to hold back the tide of AI?
As I aforementioned, I'm not against AI. It just doesn't excite me as a person who knows how it works and what it does, and the researchers' attitude is leaving a bad taste in my mouth.
Actually, no it doesn't. This topic is about AI training on code.
Courts have not held that this is illegal.
But there are absolutely other things, that people might do with code, that break copyright law.
> it's your problem, not mine.
Oh, but it would be your problem as well, if you break the law, and someone else sues you for it.
That's the difference. AI training is not against the law. Other things, that you are imagining in your head right now, very well could be, and you could lose.
> Same will be very valid for your works
Not if what you are hypothetically doing breaks the law, and AI training doesn't break the law.
So that the difference, which makes the reasoning legitimate.
Actually yes. I'm not against the tech. I'm against using my code without consent for a tool which allows to breach the license I put my code under.
IOW, if Copilot understood code licenses and prevented intermixing incompatibly licensed code while emitting results for my repository, I might have slightly different stance on the issue.
Laws are just codified version of ethics. Just because it's not codified in law, it doesn't mean it's ethically correct, and I hold ethics over laws. Some people call this conscience, others call this honor.
Just because it's not deemed illegal, it's not deemed ethical. These are different things. The world has worked under honor and ethical codes for a very long time, and still works under these unwritten laws in a lot of areas.
Science, software and other frontiers value ethics and principles a great deal. Some niches like AI largely ignore these, and I find this disturbing.
However, some people prefer to play the game with the written rules only, and as I said, I'm investing in popcorn futures to see what's gonna happen to them.
I might tank and go bankrupt of course, but I will sleep better at night for sure, and this is more important for me at the end.
I'm passionate about computers, yes. This is also my job, yes, but I'm not the person who'll do reckless things just because an incomplete code of written ethics doesn't prevent me to do it.
I'd rather not do anything to anyone which I don't want to receive. IOW, I sow only the seeds which I want to reap.
And a quite reasonable code of ethics is thst people do not have absolute, complete control over their intellectual property, and instead only have the ability to control it in certain circumstances.
Things like fair use, which makes this legal, exists for many very good reasons.
So yes, the code of ethics that society has decided on, includes perfectly reasonable exception, such as fair use, and it is your problem, not ours, that you have some ridiculous idea that people should have complete, 100% authoritarian control over their IP.
And no, people not having infinite control over IP, does not allow you to extend this reasonable exception, to you being able to do literally anything to other people's IP.
What I say with the GPL license is clear:
If you derive anything from this code base, you're agreeing and obliged to carry this license to the target code base (The logical unit in this case is a function in most cases).
So the case is clear. AI is a derivation engine. What you obtain is a derivation of my GPL licensed code. Carry the license, or don't use that snippet, or in AI's case, do not emit GPL derived code for non-GPL code bases.
This is all within the accepted ethics & law. Moreover, it's court tested too.
People are not agreeing though.
They are not agreeing, because there is a perfectly reasonable ethical and legal principle called fair use, which society has determined allows people to engage in limited use of other people's IP, no matter what the license says.
> Carry the license, or don't use
Or, instead of that, people could reasonably use fair use, and ignore the license, as fair use exists for many good legal and ethical reasons.
And no, you do not get to extend that out, to doing anything you want to do, just because there is a reasonable exception called fair use.
> do not emit GPL derived code for non-GPL code bases
Or, actually, yes do this. This is allowed because of the reasonable ethical and moral principle called fair use, which allows people to ignore your license.
Thanks for the discussion, and have a nice day.
I may not further comment on this thread from this point.