zlacker

Well yes, there'd be no way for the copilot model, as currently specified and trained, to know.

But it IS possible to train a model for that. In fact, I believe ML models can be fantastic "code archaeologists", giving us insights into not just direct copying, but inspiration and idioms as well. They don't just have the code, they have commit histories with timestamps.

A causal fact which these models could incorporate, is that we know data from the past wasn't influenced by data from the future. I believe that is a lever to pry open a lot of wondrous discoveries, and I can't wait until a model with this causal assumption is let loose on Spotify's catalog, and we get a computer's perspective on who influenced who.

But in the meantime, discovering where copy-pasted code originated should be a lot easier.

replies(1): >>pca006+7h

>>vinter+(OP)
Ah, a plagiarism checker that can understand simple code transformation and find the original source? Sounds like a good idea for patent trolls and I have no idea about how/if copyright laws can be apply in this case. Does copying the idea but not copying the code verbatim constitutes copyright violation?

replies(1): >>vinter+2s

>>pca006+7h
The patent troll version of the algorithm needs the victim's bank balance as input too. In fact that's probably all it needs.

It would be much more valuable for people who care about the truth.