The New York Times is suing OpenAI and Microsoft for copyright infringement

>>ssgodd+(OP)
Even if they win against openAI, how would this prevent something like a Chinese or Russian LLM from “stealing” their content and making their own superior LLM that isnt weakened by regulation like the ones in the United States.

And I say this as someone that is extremely bothered by how easily mass amounts of open content can just be vacuumed up into a training set with reckless abandon and there isn’t much you can do other than put everything you create behind some kind of authentication wall but even then it’s only a matter of time until it leaks anyway.

Pandora’s box is really open, we need to figure out how to live in a world with these systems because it’s an un winnable arms race where only bad actors will benefit from everyone else being neutered by regulation. Especially with the massive pace of open source innovation in this space.

We’re in a “mutually assured destruction” situation now, but instead of bombs the weapon is information.

>>dissid+B6
Access to ressources is hardly a new problem: when I was an NLP graduate student about a decade ago a teacher of us had scrapped (and continued to do so) a major newspaper for years to make a corpus. The legality of that was questionable at best, yet it was used in academic paper and a subset for training.

The same is equally applicable to image: Google got rich in part by making illegal copies of whatever image he could find. Existing regulations could be updated to include ML model but that won't stop bad or big enough actors to do what they want.

> We’re in a “mutually assured destruction” situation now

No, we aren't. Very good spam generators aren't comparable to mass destruction weapons.

zlacker