GitHub Copilot, with “public code” blocked, emits my copyrighted code

>>davidg+(OP)
Howdy, folks. Ryan here from the GitHub Copilot product team. I don’t know how the original poster’s machine was set-up, but I’m gonna throw out a few theories about what could be happening.

If similar code is open in your VS Code project, Copilot can draw context from those adjacent files. This can make it appear that the public model was trained on your private code, when in fact the context is drawn from local files. For example, this is how Copilot includes variable and method names relevant to your project in suggestions.

It’s also possible that your code – or very similar code – appears many times over in public repositories. While Copilot doesn’t suggest code from specific repositories, it does repeat patterns. The OpenAI codex model (from which Copilot is derived) works a lot like a translation tool. When you use Google to translate from English to Spanish, it’s not like the service has ever seen that particular sentence before. Instead, the translation service understands language patterns (i.e. syntax, semantics, common phrases). In the same way, Copilot translates from English to Python, Rust, JavaScript, etc. The model learns language patterns based on vast amounts of public data. Especially when a code fragment appears hundreds or thousands of times, the model can interpret it as a pattern. We’ve found this happens in <1% of suggestions. To ensure every suggestion is unique, Copilot offers a filter to block suggestions >150 characters that match public data. If you’re not already using the filter, I recommend turning it on by visiting the Copilot tab in user settings.

This is a new area of development, and we’re all learning. I’m personally spending a lot of time chatting with developers, copyright experts, and community stakeholders to understand the most responsible way to leverage LLMs. My biggest take-away: LLM maintainers (like GitHub) must transparently discuss the way models are built and implemented. There’s a lot of reverse-engineering happening in the community which leads to skepticism and the occasional misunderstanding. We’ll be working to improve on that front with more blog posts from our engineers and data scientists over the coming months.

>>_ryanj+2z
Hey Ryan! Have you ever done any reading on the Luddites? They weren't the anti technology, anti progress social force people think they were.

They were highly skilled laborers who knew how to operate complex looms. When auto looms came along, factory owners decided they didn't want highly trained, knowledgeable workers they wanted highly disposable workers. The Luddites were happy to operate the new looms, they just wanted to realize some of the profit from the savings in labor along with the factory owners. When the factory owners said no, the Luddites smashed the new looms.

Genuinely, and I'm not trying to ask this with any snark, do you view the work you do as similar to the manufacturers of the auto looms? The opportunity to reduce labor but also further the strength of the owner vs the worker? I could see arguments being made both ways and I'm curious about how your thoughts fall.

>>social+rY
But - the only reason anyone makes money (other than tax money) is because they're useful to someone else. Almost all of the clothing industry companies make money from large numbers of people buying their clothes. So they are useful to us.

Similarly, the reason Europe put 30% of its populace "out of work" by industrialising agriculture is why we don't have to all go work in fields all day. It is a massive net positive for us all.

Moving ice from the arctic into America quickly enough before it melted was a big industry. The refrigerator put paid to that, and improved lives the world over.

Monks retained knowledge through careful copying and retransmission of knowledge during the medieval times in the UK. That knowledge was foundational in the incredible acceleration of development in the UK and neighbouring countries in the 18th and 19th centuries. But the printing press, that rendered those monks much less relevant to culture and academia, was still a very good idea that we all still benefit from today.

Soon, millions of car mechanics who specialise in ICE engines will have to retrain or, possibly, just be made redundant. That may be required for us to reduce our pollution output by a few percent globally, and we may well need to do that.

The exact moment in history when workers who've learned how to do one job are rendered obsolete is painful, yes, and they are well within their rights to what they can to retain a living. But that doesn't mean those workers are somehow right; nor that all subsequent generations should have to delay or forego the life improvement that a useful advance brings, nor all of the advances that would be built on that advance.

>>robert+fg1
> the only reason anyone makes money (other than tax money) is because they're useful to someone else.

Stealing, scamming, gambling, inheriting, collecting interest, price gouging, slavery, underpaying workers, supporting laws to undermine competitors… Plenty of ways to make money without being useful—or by being actively harmful—to someone else.

> Almost all of the clothing industry companies make money from large numbers of people buying their clothes. So they are useful to us.

We don’t need all that clothing, made by monetarily exploiting people in poor countries and sold by emotionally exploiting people in rich countries under the guise of “fashion”. The usefulness line has long been crossed, it’s about profit profit profit.

>>latexr+Vm1
> Plenty of ways to make money without being useful—or by being actively harmful—to someone else.

I don't equate, say, "making money" with "stealing money". I mean the way people do things within the law. Inheriting is different; the money is already made. Interest is being useful to someone else, via the loan of capital.

>>robert+CG1
> I mean the way people do things within the law.

The examples considered that: gambling, collecting interest, price gouging, underpaying workers, supporting laws to undermine competitors.

zlacker