Failing to solve every problem does not mean a solution is a failure.
From sunscreen to seatbelts, the world is full of great solutions that occasionally fail due to statistics and large numbers.
If you "violate" a robots.txt the server administrator can choose to block your bot (if they can fingerprint it) or IP (if its static).
With an ai.txt there is no potential downside to violating it - unless we get new legislation enforcing its legal standing. The nature of ML models is that it's opaque what content exactly it's trained on, there is no obvious retaliation or retribution.
I don't think that's what OP is envisioning based on their post!
The purpose OP is suggesting in the submission is the opposite, help AI crawlers to understand what the page/website is about without actually having to infer the purpose from the content itself.
I don't see the OP saying anything about "ai.txt" being for that? They're advocating it as a way that AIs could use fewer tokens to understand what a site is about.
(Which I also don't think is a good idea, since we already have lots of ways of including structured metadata in pages, but the main problem is not that crawlers would ignore it.)
There is something to be said though to OP's point where it's actually better to do nothing than an AI.txt because it can give a false sense of security, which is obviously not what you want.
And if you feel like rolling out the "welcome friend!" doormat to a particular training data crawler, you are free to dedicate as detailed a robots.txt block as you like to its user agent header of choice. No new conventions needed, everything is already on place.
In other words, there's no need to create an ai.txt when the robots.txt standard can just be extended.
Anytime a business is caught using that content, they can't claim that they used publicly available information, because the ai.txt specifically signalled to everyone in a clear and unambiguous manner that the copyright granted by viewing the page is witheld from ai training.
It would also be useful to distinguish training crawlers from indexing crawlers. Maybe I'm publishing personal content. It's useful for me to have it indexed for search, but I don't want an AI to be able to simulate me or my style.