If the issue is more specifically copyright infringement, then leverage the legal apparatus in place for that. Their lawyers might listen better.
This is not a strongly held opinion and if you disagree I would love to hear your constructive thoughts!
That's not considering any legal / license issues, just a simple statement about the data used to train CP.
If I create something, I get to define the terms of its use, reproduction, distribution, etc. "Value" plays no part in whether someone can appropriate and distribute that creation without permission from the creator.
Take a single C file or even a long function from the leaked Windows NT codebase and include it in your code. See how happy Microsoft will be with it. They spent millions of dollars on their legal teams. Eroding copyright protections will harm the weakest most. How many open source contributors can afford copyright lawyers?
The best way to make sure your code isn’t copied is not to publish it.
Please refrain yourself from this kind of blatant gaslighting. You're not the one to assess its value or usefulness and your point is at most tangential to the issue. The problem is that the model systematically took non-public domain code without any permits from the author, not whether it's useful or not. It's worth to hear this complaint and Copilot team should be more accountable for this problem since this could lead to more serious copyright infringement fights for its users.
Right or wrong, copyright doesn't care about how valuable something is. Everything is equally (not in reality but in theory) protected. GitHub is a platform many people have trusted with protecting ownership of their copyrighted code through reasonable levels of security.
I think the big discussion point here is around ensuring that this tool is acting correctly and respecting rights of an individual. It's very easy for a large company to accidentally step on people and not realise it or brush it away. People want to make sure that isn't happening and right now there are some very compelling examples where it looks like this is happening. The fact that this isn't opt-in and there's no way to opt-out your public repositories means the choice has been taken away from people. Previously you were free to license your code as you see fit, now we have some examples of where that license may not be being respected as a result of an expensive GitHub feature.
I think this is where the conversation is centring. It's not about whether your code is valuable or not. It's whether a large company is making profit by stepping on an individuals right of ownership or not.
On the note of leveraging legal apparatus to figure it out I think you're right. The problem is what individual open source maintainer is going to have the funds to bring a reasonable equal legal challenge to such a large organisation? I maintain a relatively well used open source project and I sure as hell don't. Realistically my option is to either spend a lot of personal time and resources to challenge it (if I think wrong-doing is happening) or just suck it up. Given that there's no easy way to figure out if wrong-doing is happening because it's all in the AI soup, it makes it even harder to consider that approach.
I think the point is a lot less about the value of the code, and much more about a massively organisation playing hard and fast with an individuals rights.
None of this is to say GitHub have actually done anything wrong here. I'm sure we'll figure that out in time, but it would be great if they could figure out a way to provide more concrete explanations.
It doesn't matter if I think my code is valuable, it's that Github is using everyone's code for their own profit - without opt-in, attribution, or paying a license.
I always found this weird while I was working at this company, but then, they have no reason to care about ephemeral threats that have never been brought to bear in a meaningful way. No consequences = no reason to spend literal billions retooling the entire tech side of your company over a decade.
It basically happens like this:
"Oh this code solves our problems and has a nice community around it for network effects!"
**developers proceed to adopt codebase without checking the license**
**months later**
"Oh, huh this license has some interesting language in it..."
Then the employee doesn't mention it; because the risk of having to re-do a bunch of work feels higher than the risk of getting in trouble for violating a license. Basically, unless it's Oracle; people just kinda shrug it off as a "wontfix".
My whole thing is that any system depending on people to read and follow a license is quite flawed in terms of enforcement, and is largely designed specifically so that powerful encumbents can make claims, not individual developers.
Laws have to be enforced or people will ignore them. If there's no practical way to enforce a law that doesn't involve violating freedoms - you're kinda fucked.