What has been crawled stays crawled and there are plenty of copies of sets of tokens that can be used to retrain a model. For a bit of money you can probably get any set that you really want (bit: billions, but pocket change for anything that is going to go head to head with OpenAI).