I don't see how copilot or similar tools can solve this problem without vetting each project.
If you can't trust that the code in a project is compatible with the license of the project then the only option I see is that copilot cannot exist.
I love free software and whatnot, but I have a feeling this situation would've been quite different if copilot was made by the free software community and accidentally trained on some non free code..
That's a really hard undersell of responsibility on the part of Microsoft/Github.
It seems as though they did approximately zero work to verify any of the code wasn't infringing. Things they could have tried but apparently didn't:
1) Ask developers to opt-in to copilot scanning of their repositories, and alongside that have them certify that they hold copyright over all lines of code included in the repository.
2) Use a training dataset of only public repositories listed under applicable pre-identified licensing schemes, from established groups. e.g.: *bsd licensed code from the various BSD OSes.
3) Sought out examples from standard libraries in other programming languages with suitable licenses.
It seems like they did nothing and just hoped. I can't see how anyone would try to rely on this thing in a commercial context after its proven to do this over and over. The well has been poisoned.