I think Google is probably thinking hard about the problem of training AI: you don't want to train on the output of other AI. That doesn't mean the content shouldn't be processed, just that it shouldn't be used for training. Or maybe it's worth noting that some content is derived from other content that you've manually produced, versus content derived from the content of third parties.
Said another way, I expect that Google isn't just implementing a new allowlist/denylist. It's likely about exposing new information about content.
Now that I think of it- why do we put up with robots.txt at all?
> A robots.txt file tells search engine crawlers which URLs the crawler can access on your site. This is used mainly to avoid overloading your site with requests
If someone overloads your site with automated requests how is that not criminal? Why aren't they liable?
Criminal requires a specific law in the criminal code be intentionally broken.
There is a world of difference between an intentional DoS and a crawler adding some marginal traffic to a server then backing off when the server responses fail.
If Google says they'll delist your site if they detect AI generated content that you haven't declared, that's also a you problem (you meaning webmasters). It's a bit silly to think that it's a purely one way relationship. You're more than welcome to block Google from indexing your site (trivially!) and they're welcome to not include you in their service for not following their guidelines.