Based on these companies' arguments that copyrighted material is not actually reproduced by these models, and that any seemingly-infringing use is the responsibility of the user of the model rather than those who produced it, anyone could freely generate an infinite number of high-truthiness OpenAI anecdotes, freshly laundered by the inference engine, that couldn't be used against the original authors without OpenAI invalidating their own legal stance with respect to their own models.
Training an LLM with the intent of contravening an NDA is just plain <intent to contravene an NDA>. Everyone would still get sued anyway.
no one building this software wants to “steal from creators” and the legal precedent for using copyrighted works for the purpose of training is clear with the NYT case against open AI
It’s why things like the recent deal with Reddit to train on their data (which Reddit owns and users give up when using the platform) are becoming so important, same with Twitter/X