Then it dawned on me how many companies are deeply integrating Copilot into their everyday workflows. It's the perfect Trojan Horse.
There may very well be clever techniques that don't require directly training on the users' data. Perhaps generating a parallel paraphrased corpus as they serve user queries - one which they CAN train on legally.
The amount of value unlocked by stealing practically ~everyone's lunch makes me not want to put that past anyone who's capable of implementing such a technology.