I think the appropriation, privatization, and monetization of "all human output" by a single (corporate) entity is at least shameless, probably wrong, and maybe outright disgraceful.
But I think OpenAI (or another similar entity) will succeed via the Sackler defense - OpenAI has too many victims for litigation to be feasible for the courts, so the courts must preemptively decide not to bother with compensating these victims.
The output is still there for anyone else to train on if they want.
I believe the innovation that will really “win” generative AI in the long term is one that figures out how to keep the model populated with fresh, relevant, quality information in a sustainable way.
I think generative AI represents a chance to fundamentally rethink the value chain around information and research. But for all their focus on “non-profit” and “good for humanity”, they don’t seem very interested in that.
https://en.wikipedia.org/wiki/Sackler_family
The family has been largely successful at avoiding any personal liability in Purdue’s litigations. Many people feel the settlements of the Purdue lawsuits were too lenient. One of the key perceived aspects of the final settlements was that there was too many victims of the opioid epidemic for the courts to handle and attempt to make whole.
Legal arguments aside, the goldrush era of data scraping is over. Major sources of content like Reddit and Twitter have killed APIs, added defenses and updated EULAs to avoid being pillaged again. More and more sites are moving content behind paywalls.
There's also the small issue of having 10s of millions of VC dollars to rent/buy hundreds of high end GPUs. OpenAI and friends are also trying their hardest to prevent others doing so via 'Skynet' hysteria driven regulatory capture.