Yes, the training of the model itself is (or should be) a transformative act so you can train a model on whatever you have legal access to view.
However, that doesn't mean that the output of the model is automatically not infringing. If the model is prompted to create a copy of some copyrighted work, that is (or should be) still a violation.
Just like memorizing a book isn't infringment but reproducing a book from memory is.
If MS were compelled to reveal how these completions are generated, there’s at least a possibility that they directly use public repositories to source text chunks that their “model” suggested were relevant (quoted as it could be more than just a model, like vector or search databases or some other orchestration across multiple workloads).
I don't see why a company which has been waging a multi decade war against GPL and users' rights would stop at _public_ repositories.
The only thing it suggests is that they recognize that a subset of users worry about it. Whether or not GitHub worries about it any further isn’t suggested.
Don’t think about it from an actual “rights” perspective. Think about the entire copyright issue as a “too big to fail” issue.