This isn't the only option though? You could restrict it to data where permission has been acquired, and many people would probably grant permission for free or for a small fee. Lots of stuff already exists in the public domain.
What ML people seem to want is the ability to just scoop up a billion images off the net with a spider and then feed it into their network, utilizing the unpaid labor of thousands-to-millions for free and turning it into profit. That is transparently unfair, I think. If you're going to enrich yourself, you should also enrich the people who made your success possible.