https://ploum.net/2022-12-05-drowning-in-ai-generated-garbag...
I would've imagined training sets were heavily curated and annotated. We already know how to solve this problem for training humans (or our kids would never learn anything useful) so I imagine we could solve it similarly for AIs.
In the end, if it's quality content, learning it is beneficial - no matter who produced it. Garbage needs to be eliminated and the distinction is made either by human trainers or already trained AIs. I have no idea how to train the latter but I am no expert in this field - just like (I suspect) the author of that blog.