GitHub Copilot, with “public code” blocked, emits my copyrighted code

>>davidg+(OP)
Howdy, folks. Ryan here from the GitHub Copilot product team. I don’t know how the original poster’s machine was set-up, but I’m gonna throw out a few theories about what could be happening.

If similar code is open in your VS Code project, Copilot can draw context from those adjacent files. This can make it appear that the public model was trained on your private code, when in fact the context is drawn from local files. For example, this is how Copilot includes variable and method names relevant to your project in suggestions.

It’s also possible that your code – or very similar code – appears many times over in public repositories. While Copilot doesn’t suggest code from specific repositories, it does repeat patterns. The OpenAI codex model (from which Copilot is derived) works a lot like a translation tool. When you use Google to translate from English to Spanish, it’s not like the service has ever seen that particular sentence before. Instead, the translation service understands language patterns (i.e. syntax, semantics, common phrases). In the same way, Copilot translates from English to Python, Rust, JavaScript, etc. The model learns language patterns based on vast amounts of public data. Especially when a code fragment appears hundreds or thousands of times, the model can interpret it as a pattern. We’ve found this happens in <1% of suggestions. To ensure every suggestion is unique, Copilot offers a filter to block suggestions >150 characters that match public data. If you’re not already using the filter, I recommend turning it on by visiting the Copilot tab in user settings.

This is a new area of development, and we’re all learning. I’m personally spending a lot of time chatting with developers, copyright experts, and community stakeholders to understand the most responsible way to leverage LLMs. My biggest take-away: LLM maintainers (like GitHub) must transparently discuss the way models are built and implemented. There’s a lot of reverse-engineering happening in the community which leads to skepticism and the occasional misunderstanding. We’ll be working to improve on that front with more blog posts from our engineers and data scientists over the coming months.

>>_ryanj+2z
Hey Ryan! Have you ever done any reading on the Luddites? They weren't the anti technology, anti progress social force people think they were.

They were highly skilled laborers who knew how to operate complex looms. When auto looms came along, factory owners decided they didn't want highly trained, knowledgeable workers they wanted highly disposable workers. The Luddites were happy to operate the new looms, they just wanted to realize some of the profit from the savings in labor along with the factory owners. When the factory owners said no, the Luddites smashed the new looms.

Genuinely, and I'm not trying to ask this with any snark, do you view the work you do as similar to the manufacturers of the auto looms? The opportunity to reduce labor but also further the strength of the owner vs the worker? I could see arguments being made both ways and I'm curious about how your thoughts fall.

>>social+rY
But - the only reason anyone makes money (other than tax money) is because they're useful to someone else. Almost all of the clothing industry companies make money from large numbers of people buying their clothes. So they are useful to us.

Similarly, the reason Europe put 30% of its populace "out of work" by industrialising agriculture is why we don't have to all go work in fields all day. It is a massive net positive for us all.

Moving ice from the arctic into America quickly enough before it melted was a big industry. The refrigerator put paid to that, and improved lives the world over.

Monks retained knowledge through careful copying and retransmission of knowledge during the medieval times in the UK. That knowledge was foundational in the incredible acceleration of development in the UK and neighbouring countries in the 18th and 19th centuries. But the printing press, that rendered those monks much less relevant to culture and academia, was still a very good idea that we all still benefit from today.

Soon, millions of car mechanics who specialise in ICE engines will have to retrain or, possibly, just be made redundant. That may be required for us to reduce our pollution output by a few percent globally, and we may well need to do that.

The exact moment in history when workers who've learned how to do one job are rendered obsolete is painful, yes, and they are well within their rights to what they can to retain a living. But that doesn't mean those workers are somehow right; nor that all subsequent generations should have to delay or forego the life improvement that a useful advance brings, nor all of the advances that would be built on that advance.

>>robert+fg1
You see this argument over and over again but it’s the exception that proves the rule.

Most of the time when it’s made it’s just papering over yer another situation where a surplus is being squeezed out of a transaction by a parasitic manager class using principal-agent problem dynamics.

The people who invented this stuff are always trying to tell you they’ve invented the cotton gin or something when in fact they’ve just come up with a clever way to take someone else’s work and exploit it.

>>CPLX+Rt1
What was described wasn't the principal-agent problem. If I'm an employee and my job becomes simpler or more productive through an automation investment by someone else, I don't think I deserve part of the increased profit unless I'm part of a profit-sharing agreement that would also see me absorb losses.

>>robert+sx1
> unless I'm part of a profit-sharing agreement that would also see me absorb losses

And how many workers even have the possibility of an arrangement like this, i.e. a worker-owned cooperative?

Yes, that is exactly the point. When a labour-saving technological development comes along, it's payday to the capital-having class and dreary times for the labour-doing class.

>>andrep+jL2
And it's good for everyone down the line, because the good being produced becomes more affordable and better. It might be hard to zoom out from these current times when we can expect continual progress, but this is one of the only reasons why anything ever gets better.

I'm from the UK, and we used to make motorbikes. They got - correctly - outcompeted by Japanese bikes in the 1950s that were built with more modern investment and tooling. If Japan hadn't done that, we'd have more motorcycle jobs in the UK, and terrible motorcycles that still leaked oil because the seam of the crankcase would still be vertical and not horizontal.

I'm not saying anything about this process is perfect and pain-free, but it seems that a lot of the things we have now are because of processes like this. Should Tesla sell through dealerships instead of direct to consumers? I think the answer is, "Tesla should do what's best for its customers", and not "Tesla should act to keep dealership jobs and not worry about what's best for its customers."

Businesses exist for their customers and not their employees, and having just been part of a business that, shall we say, radically downsized, I've seen a little of the pain of that. Thankfully it was a high tech business, and as the best employment protection is other employers, and there are loads of employers wanting tech skills I've seen my great colleagues all get new jobs. But I think it's ultimately disempowering to think of your employer like a superior when it should feel like an equal whose goals happen to coincide with yours for a while.

zlacker