I think that the argument being made by some artists is that the training process itself violates copyright just by using the training data.
That’s quite different from arguing that the output violates copyright, which is what the tweet in this case was about.
Now a human can take inspiration from like 100 different sources and probably end up with something that no one would recognize as derivative to any of them. But it also wouldn't be obvious that the human did that.
But with an ML model, it's clearly a derivative in that the learned function is mathematically derived from its dataset and so is all the resulting outputs.
I think this brings a new question though. Because till now derivative was kind of implied that the output was recognizable as being derived.
With AI, you can tweak it so the output doesn't end up being easily recognizable as derived, but we know it's still derived.
Personally I think what really matters is more a question of what should be the legal framework around it. How do we balance the interests of AI companies and that of developers, artists, citizens who are the authors of the dataset that enabled the AI to exist. And what right should each party be given?