The New York Times is suing OpenAI and Microsoft for copyright infringement

>>ssgodd+(OP)
The arguments about being able to mimic New York Times “style” are weak, but the fact that they got it to emit verbatim NY Times content seems bad for OpenAI:

> As outlined in the lawsuit, the Times alleges OpenAI and Microsoft’s large language models (LLMs), which power ChatGPT and Copilot, “can generate output that recites Times content verbatim

>>Aurorn+84
Sarah Silverman is claiming the same thing about her book.

But I've tried really hard to get ChatGPT to output sentences verbatim from her book and just can't get it to. In fact, I can't even get it to answer simple questions about facts that are in her book but nowhere else -- it just says it doesn't know.

Similarly I haven't been able to reproduce any text in the NYT verbatim unless it's part of a common quote or passage the NYT is itself quoting. Or it's a specific popular quote from an article that went viral, but there aren't that many of those.

Has anyone here ever found a prompt that regurgitates a paragraph of a NYT article, or even a long sentence, that's just regular reporting in a regular article?

>>crazyg+e6
The complaint has specific examples they got from ChatGPT.

There is a precedent: There were some exploit prompts that could be used to get ChatGPT to emit random training set data. It would emit repeated words or gibberish that then spontaneously converged on to snippets of training data.

OpenAI quickly worked to patch those and, presumably, invested energy into preventing it from emitting verbatim training data.

It wasn’t as simple as asking it to emit verbatim articles, IIRC. It was more about it accidentally emitting segments of training data for specific sequences that were semi rare enough.

zlacker