Is there a decent guess at how much training data for ChatGPT is copyrighted work and subject to being removed depending on a few court cases? GPT4 is supposed to be an order of magnitude larger than the open source models that use essentially everything that can be used without asking. So that whole magnitude?