I honestly thought that OpenAI had simply paid for access to the new corpus. I'm actually on the side of OpenAI here - if you put something on the web, you can't get upset when people read it. Training a neural network is not functionally different from a human reading it and remembering it.
But if I were OpenAI, I would have tried to do a deal to pay them anyway. Having official access is surely easier than scraping the web - and the optics of it is much better.