The New York Times is suing OpenAI and Microsoft for copyright infringement

>>ssgodd+(OP)
I hope this results in Fair Use being expanded to cover AI training. This is way more important to humanity's future than any single media outlet. If the NYT goes under, a dozen similar outlets can replace them overnight. If we lose AI to stupid IP battles in its infancy, we end up handicapping probably the single most important development in human history just to protect some ancient newspaper. Then another country is going to do it anyway, and still the NYT is going to get eaten.

>>solard+Aj
Why shouldn't the creators of the training content get anything for their efforts? With some guiderails in place to establish what is fair compensation, Fair Use can remain as-is.

>>strong+Ul
The issue as I see it is that every bit of data that the model ingested in training has affected what the model _is_ and therefore every token of output from the model has benefited from every token of input. When you receive anything from an LLM, you are essentially receiving a customized digest of all the training data. The second issue is that it takes an enormous amount of training data to train a model. In order to enable users to extract ‘anything’ from the model, the model has to be trained on ‘everything’. So I think these models should be looked at as public goods that consume everything and can produce anything. To have to keep a paper trail on the ‘everything’ part (the input) and send a continuous little trickle of capital to all of the sources is missing the point. That’s like a person having to pay a little bit of money to all of their teachers and mentors and everyone they’ve learned from every time they benefit from what they learned.

OpenAI isn’t marching into the online news space and posting NY Times content verbatim in an effort to steal market share from the NY Times. OpenAI is in the business of turning ‘everything’ (input tokens) into ‘anything’ (output tokens). If someone manages to extract a preserved chunk of input tokens, that’s more like an interesting edge case of the model. It’s not what the model is in the business of doing.

Edit: typo

>>apante+An
What's wrong with paying copyright holders, then? If OpenAI's models are so much more valuable than the sum of the individual inputs' values, why can't the company profit off that margin?

>That’s like a person having to pay a little bit of money to all of their teachers and mentors and everyone they’ve learned from every time they benefit from what they learned.

I could argue that public school teachers are paid by previous students. Not always the ones they taught, but still. But really, this is a very new facet of copyright law. It's a stretch to compare it with existing conventions, and really off to anthropomorphize LLMs by equating them to human students.

zlacker