As users we're forced to browse the Web with a million agreements that say "by using this site you agree to our Terms", what stops you from saying "by crawling this site to train your AI you agree to share profits with us" or whatever, particularly if you can prove that your data ends up being used?
This would block search engines but on some URL's this may be fine, such as data one would not want LLM's to hoover up.