Statically linked binaries for example have parts of libraries embedded into them. There exists tools that can analyze the binary and try to detect signatures from a shared library in the binary.
In the past there were (and probably still are) companies who provided services to help with finding people who have linked in your code so you could take whatever action you wanted against them. I can't recall a specific company name right now but a little bit of Googling would likely bring up some examples.
To be safe, we'd have to get Microsoft to agree to indemnify users (if they really believe using this is safe, they should do so), or wait until a court case on copyright as it regard to training corpus for large models is decided and appeals are exhausted.