zlacker

[parent] [thread] 4 comments
1. theRea+(OP)[view] [source] 2023-06-10 16:32:12
Nobody uses copilot intentionally to violate copyright law. People do use crypto mixers intentionally to violate money laundering laws.
replies(2): >>SpicyL+04 >>cmrdpo+Cc6
2. SpicyL+04[view] [source] 2023-06-10 16:50:59
>>theRea+(OP)
Nobody affirmatively says “yes, my goal is to violate copyright law, and Copilot is the best tool I’ve found”. But it doesn’t seem impossible to me that the value of Copilot comes partially from the fact that it can copy paste code from copyrighted repositories in ways which would be illegal for you or I to do. I’m not sure it’s proven yet but I wouldn’t be shocked if it is in the future.
replies(1): >>shagie+yf
◧◩
3. shagie+yf[view] [source] [discussion] 2023-06-10 17:56:04
>>SpicyL+04
It provides the same value as someone who copies and pastes code from Stack Overflow or any of the predecessors without concerning themselves with the license.

I am certain that I can find code from Linux or gcc or emacs on Stack Overflow that is under a GPL license and not compatible with the CC license Stack Overflow uses... and yet it's there. What's more, people will copy that code into their own ignoring the CC license too.

How is that really any different than using Copilot if the original license and attribution are something to respect.

Note that I do think that the original license is something to respect which is why for any of the code that I write that has copyright that matters on it (toy program for home? meh. Hobby project repo that I'm working on that I'll publish? yep. Employer's code for work? absolutely.) I either don't touch questionable sources or run a license check on it when using it.

The key thing is that I don't consider the use of Copilot to be any more controversial than copying from Stack Overflow - which has been done by countless programmers for a decade before Copilot existed and no one got up in arms about it then.

replies(1): >>cmrdpo+ge6
4. cmrdpo+Cc6[view] [source] 2023-06-12 16:07:30
>>theRea+(OP)
Copilot is a product -- at least indirectly -- of Microsoft, a company who for about a decade made very public pronouncements about how they disagreed with the GPL (or copyleft generally), found it problematic, and tried actively to discourage its use.

Today's MS isn't really the same, and they've clearly made their peace with Linux. But it still happens that the GPL is in some fundamental ways at odds with commercial exploitation of open source code. So any corporate entity is going to struggle with it because at best it requires being very careful in distribution, or trying to negotiate or cut a deal with the licensee. At worst it can lead to legal problems and IP leakage on your own product.

So, not claiming any conspiracy. Or intent to violate intentionally. But it is in the convenient interests of companies like MS/OpenAI/GitHub to treat open source work as effectively public domain rather than under copyright, and to push the limits there.

The risk to an employer is of course the accidental introduction of such copylefted material into their code-base through copilot or similar tools.

I suspect two sources of disconnect with the broader community on hackernews that doesn't seem to see the issue here:

a) Much of the folks on this forum are working in the full-stack/web space where fundamentally novel, patented, or conceptually difficult algorithms and datastructures are rare. For them Copilot is an absolute blessing in helping to reduce the tedium of boilerplate. However in the embedded systems, operating systems, compiler, game engine dev, database internals etc. world there are other aspects at work. In certain contexts, Copilot has been shown to reproduce complicated or difficult code taken from copyrighted or copylefted (or maybe even patented sources) without attribution. And apparently now with some explicit obfuscation.

To put it another way: it's unlikely that Copilot's going to violate licenses with its assistance with turning your value/model objects from one structure to another, or writing a call into a SQL ORM. But it's quite possible that if I'm writing a DB join algorithm or some complicated math in a rendering engine or a compiler optimization phase that it could "crimp notes" from a source under restrictive license... because those things are absolutely in its learning set and the LLM doesn't "know" about the licensing behind them.

b) Either misunderstanding of, or lack of knowledge of, or outright hostility to... copylefted or attribution licenses which require special handling.

◧◩◪
5. cmrdpo+ge6[view] [source] [discussion] 2023-06-12 16:13:59
>>shagie+yf
Browsing Stack Overflow and even blindly copy and pasting is an intentional action done by research by the user, and the source of the material pasted is known or discoverable.

Using Copilot is an automated process, and the source of the material used in learning is deeply obfuscated in the learning model.

That's why I make the analogy back to cryptocurrency mixers.

[go to top]