zlacker

[parent] [thread] 13 comments
1. leni53+(OP)[view] [source] 2022-10-17 05:34:29
If that's true, than Github is just "washing its hands". Not at all reassuring for copyright holders and users of copilot.
replies(3): >>helsin+84 >>yardst+q4 >>hnbad+Rc
2. helsin+84[view] [source] 2022-10-17 06:22:26
>>leni53+(OP)
That code seems to appear in thousands of repositories on GitHub, I’m sure some of them haven’t copied the license.

The vast majority of people who would use a matrix transform function they got from code completion (or from a GitHub or stack overflow search) probably don’t care what the license is. They’ll just paste in the code. To many developers publicly viewable code is in the public domain. Code pilot just shortens the search by a few seconds.

Microsoft should try todo better (I’m not sure how), but the sad fact is that trying to enforce a license on a code fragment is like dropping dollar bills on the sidewalk with a note pinned to them saying “do not buy candy with this dollar”

replies(1): >>extrop+k6
3. yardst+q4[view] [source] 2022-10-17 06:25:20
>>leni53+(OP)
What’s the most github could reasonably be expected to do? Identify if multiple licenses are found for the same code then maybe it should be flagged for review or the most restrictive license applied.
replies(3): >>samast+k9 >>kitsun+Of >>bjourn+Aa2
◧◩
4. extrop+k6[view] [source] [discussion] 2022-10-17 06:47:57
>>helsin+84
I still remember the days when we hand billion dollar lawsuits over 20 lines of code (Oracle vs Google).

If CoPilot makes everyone see how ridiculous that is, that's a win in my book.

◧◩
5. samast+k9[view] [source] [discussion] 2022-10-17 07:20:25
>>yardst+q4
Check timestamps of commits of replicated code to find the original.
replies(2): >>barson+qc >>LelouB+9o
◧◩◪
6. barson+qc[view] [source] [discussion] 2022-10-17 07:55:45
>>samast+k9
That would only work if the original was uploaded to GitHub before the copies. Like, somebody could copy from GitLab or BitBucket. And git histories don’t always help if they’re not copied over.
replies(1): >>lokedh+4h
7. hnbad+Rc[view] [source] 2022-10-17 07:59:23
>>leni53+(OP)
Yes. GitHub can get away with "oh well, we're all learning" because if the code is violating copyright, it's the user who is infringing directly by publishing it, not GitHub via Copilot. Either the user would have to bring a case against GitHub demonstrating liability (good luck) or the copyright holder would have to bring a case against GitHub demonstrating copyright violation (again, good luck). Otherwise this is entirely between the copyright holder and the Copilot user, legally speaking.

Of course if someone does manage to set a precedent that including copyrighted works in AI training data without an explicit license to do so, GitHub Copilot would be screwed and at best have to start over with a blank slate if they can't be grandfathered. But this would affect almost all products based on the recent advancements in AI and they're backed by fairly large companies (after all, GitHub is owned by Microsoft and a lot of the other AI stuff traces back to Alphabet and there are a lot of startups funded by huge and influential VC companies). Given the US's history of business-friendly legislation, I doubt we'll see copyright laws being enforced against training data unless someone upsets Disney.

◧◩
8. kitsun+Of[view] [source] [discussion] 2022-10-17 08:31:39
>>yardst+q4
If it's possible for video and audio content (ContentID, YT), then I don't see why it shouldn't be possible for OSS.
replies(1): >>rocqua+al
◧◩◪◨
9. lokedh+4h[view] [source] [discussion] 2022-10-17 08:44:54
>>barson+qc
But copyright law doesn't really care about how you prevent infringement, just that it doesn't happen. Isn't it up to Github to come up with a way to do it, or otherwise not do it at all?
replies(2): >>yardst+jG >>minhaz+0o3
◧◩◪
10. rocqua+al[view] [source] [discussion] 2022-10-17 09:27:01
>>kitsun+Of
Do we want that though? I personally believe copyright as implemented today is harmful. The fact that code largely is able to dodge this could be seen as arguing we should be laxer with copyright, rather than arguing for strict enforcement of copyright on code.
◧◩◪
11. LelouB+9o[view] [source] [discussion] 2022-10-17 10:08:00
>>samast+k9
Timestamps of commits can't be trusted, just like commit authors.

Github can only trust push timestamps.

◧◩◪◨⬒
12. yardst+jG[view] [source] [discussion] 2022-10-17 12:41:30
>>lokedh+4h
GitHub just needs to show they have taken reasonable precautions, and if a conflict is identified, that they remediate it without undue delay.

It’s not a binary all perfectly or nothing at all. The law looks at intent and so doesn’t punish mistakes or errors so long as you aren’t being malicious or reckless or negligent.

◧◩
13. bjourn+Aa2[view] [source] [discussion] 2022-10-17 19:30:39
>>yardst+q4
The point is that CoPilot should not emit a word-for-word copy of someone else's work because that is called plagiarism.
◧◩◪◨⬒
14. minhaz+0o3[view] [source] [discussion] 2022-10-18 04:50:48
>>lokedh+4h
Github is protected by section 230, which states:

> No provider or user of an interactive computer service shall be treated as the publisher or speaker of any information provided by another information content provider

So the act of hosting copyrighted content is not actually a copyright violation for Github. They're not obligated to preemptively determine who the original copyright owner of some piece of code is, as they're not the judge of that in the first place. Even if you complain that someone stole your code, how is Github supposed to know who's lying? Copyright is a legal issue between the copyright holder and the copyright infringer. So the only thing Github is required to do is to respond to DMCA takedown notices.

[go to top]