zlacker

> For example, in 2019, The Times published a Pulitzer-prize winning, five-part series on predatory lending in New York City’s taxi industry. The 18-month investigation included 600 interviews, more than 100 records requests, large-scale data analysis, and the review of thousands of pages of internal bank records and other documents, and ultimately led to criminal probes and the enactment of new laws to prevent future abuse.

> OpenAI had no role in the creation of this content, yet with minimal prompting, will recite large portions of it verbatim.

This is the smoking gun. GPT-4 is a large model and hence highly likely to reproduce content. They have many such examples in the court filing https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...

IANAL but that's a slam dunk of copyright violation.

NYT will likely win.

Also why OpenAI should not go YOLO scaling up to GPT-5 which will likely recite more copyrighted content. More parameters, more memorization.