OpenAI isn’t marching into the online news space and posting NY Times content verbatim in an effort to steal market share from the NY Times. OpenAI is in the business of turning ‘everything’ (input tokens) into ‘anything’ (output tokens). If someone manages to extract a preserved chunk of input tokens, that’s more like an interesting edge case of the model. It’s not what the model is in the business of doing.
Edit: typo
>That’s like a person having to pay a little bit of money to all of their teachers and mentors and everyone they’ve learned from every time they benefit from what they learned.
I could argue that public school teachers are paid by previous students. Not always the ones they taught, but still. But really, this is a very new facet of copyright law. It's a stretch to compare it with existing conventions, and really off to anthropomorphize LLMs by equating them to human students.