Do I really want to use a Chinese word processor that spits unattributed passages from the NYT into the articles I write? Once I publish that to my blog now I'm infringing and I can get sued too. Point is I don't see how output which complies with copyright law makes an LLM inferior.
The argument applies equally to code, if your use of ChatGPT, OpenAI etc. today is extensive enough, who knows what copyrighted material you may have incorporated illegally into your codebase? Ignorance is not a legal defense for infringement.
If anything it's a competitive advantage if someone develops a model which I can use without fear of infringement.
Edit: To me this all parallels Uber and AirBnB in a big way. OpenAI is just another big tech company that knew they were going to break the law on a massive scale, and said look this is disruptive and we want to be first to market, so we'll just do it and litigate the consequences. I don't think the situation is that exotic. Being giant lawbreakers has not put Uber or AirBnB out of business yet.
It better. Copyright has essentially fucking ceased to exist in the eyes of AI people. Just because you have a shiny new toy doesn't mean the law suddenly stops applying to you. The internet does its best to route around laws and government but the more technologically up to date bureaucracy becomes, the faster it will catch up.
Which means that either OpenAI is allowed to be the only lawbreaker in the country (because rich and lawyers), or nobody is. I say prosecute 'em and tell them to make tools that follow the law.