And I say this as someone that is extremely bothered by how easily mass amounts of open content can just be vacuumed up into a training set with reckless abandon and there isn’t much you can do other than put everything you create behind some kind of authentication wall but even then it’s only a matter of time until it leaks anyway.
Pandora’s box is really open, we need to figure out how to live in a world with these systems because it’s an un winnable arms race where only bad actors will benefit from everyone else being neutered by regulation. Especially with the massive pace of open source innovation in this space.
We’re in a “mutually assured destruction” situation now, but instead of bombs the weapon is information.
If your business is profitable only when you get your raw materials for free it's not a very good business.
Also: GPT is not a legal entity in the united states. Humans have different rights than computer software. You are legally allowed to borrow books from the library. You are legally allowed to recite the content you read. You're not allowed to sell verbatim recitation of what you read. This is, obvious, I think? But its exactly what LLMs are doing right now.
Fortunately, the computer isn't the one being sued.
Instead it is the humans who use the computer. And those humans maintain their existing rights, even if they use a computer.
OpenAI is run by humans as well though.
So the same argument applies.
Those humans have fair use rights as well.