The New York Times is suing OpenAI and Microsoft for copyright infringement

>>ssgodd+(OP)
Solidly rooting for NYT on this - it’s felt like many creative organizations have been asleep at the wheel while their lunch gets eaten for a second time (the first being at the birth of modern search engines.)

I don’t necessarily fault OpenAI’s decision to initially train their models without entering into licensing agreements - they probably wouldn’t exist and the generative AI revolution may never have happened if they put the horse before the cart. I do think they should quickly course correct at this point and accept the fact that they clearly owe something to the creators of content they are consuming. If they don’t, they are setting themselves up for a bigger loss down the road and leaving the door open for a more established competitor (Google) to do it the right way.

>>theGnu+Ri
I think we need a lot of clarity here. I think it's perfectly sensible to look at gigantic corpuses of high quality literature as being something society would want to be fair use for training an LLM to better understand and produce more correct writing... but the actual information contained in NYT articles should probably be controlled primarily by NYT. If the value a business delivers (in this case the information of the articles) can be freely poached without limitation by competitors then that business can't afford to actually invest in delivering a quality product.

As a counter argument it might be reasonable to instead say that the NYT delivers "current information" so perhaps it'd be fair to train your model on articles so long as they aren't too recent... but I think a lot of the information that the NYT now relies on for actual traffic is their non-temporal stuff - including things like life advice and recipes.

zlacker