The New York Times is suing OpenAI and Microsoft for copyright infringement

>>ssgodd+(OP)
I hope this results in Fair Use being expanded to cover AI training. This is way more important to humanity's future than any single media outlet. If the NYT goes under, a dozen similar outlets can replace them overnight. If we lose AI to stupid IP battles in its infancy, we end up handicapping probably the single most important development in human history just to protect some ancient newspaper. Then another country is going to do it anyway, and still the NYT is going to get eaten.

>>solard+Aj
Why shouldn't the creators of the training content get anything for their efforts? With some guiderails in place to establish what is fair compensation, Fair Use can remain as-is.

>>strong+Ul
The issue as I see it is that every bit of data that the model ingested in training has affected what the model _is_ and therefore every token of output from the model has benefited from every token of input. When you receive anything from an LLM, you are essentially receiving a customized digest of all the training data. The second issue is that it takes an enormous amount of training data to train a model. In order to enable users to extract ‘anything’ from the model, the model has to be trained on ‘everything’. So I think these models should be looked at as public goods that consume everything and can produce anything. To have to keep a paper trail on the ‘everything’ part (the input) and send a continuous little trickle of capital to all of the sources is missing the point. That’s like a person having to pay a little bit of money to all of their teachers and mentors and everyone they’ve learned from every time they benefit from what they learned.

OpenAI isn’t marching into the online news space and posting NY Times content verbatim in an effort to steal market share from the NY Times. OpenAI is in the business of turning ‘everything’ (input tokens) into ‘anything’ (output tokens). If someone manages to extract a preserved chunk of input tokens, that’s more like an interesting edge case of the model. It’s not what the model is in the business of doing.

Edit: typo

zlacker