Lawmakers need to jump on this stuff ASAP. Some say that it's no different from a person looking at existing code or art and recreating it from memory or using it as inspiration. But the law changes when technology gets involved already, anyway. There's no law against you and I having a conversation, but I may not be able to record it depending on the jurisdiction. Similarly, there's no law against you looking at artwork that I post online, but it's not out of question that a law could exist preventing you from using it as part of an ML training dataset.
Suppose we trained the open AI model on the entire corpus of pop hits from about 1960 onwards.
What are the chances it would get sued for copyright infringement.
If the derivative nature is clear in the same model being trained on popular song, then it should be the same for code (or visual art, or a number of other domains).
Not arguing for current copyright law, just pointing out the inconsistencies.
For that matter, what would happen if you asked Copilot for a set of Java headers. Asking for a friend!
A more analogous situation would be if the AI model occasionally returned the entirety of "Baby One More Time" by Britney Spears. Yes, I think you'd be sued if you passed off Baby One More Time as an original work just because you got it from an open AI tool.