zlacker

Storing copies of training data is pretty much the definition of overfitting, right?

The data must be encoded with various levels of feature abstraction for this stuff to work at all. Much like humans learning art, if devoid of the input that makes human art interesting (life experience).

I think a more promising avenue for litigating AI plagiarism is to identify that the model understands some narrow slice of the solution space that contains copyrighted works, but is much weaker when you try to deviate from it. Then you could argue that the model has probably used that distinct work rather than learned a style or a category.

replies(1): >>lolind+201

>>vnoril+(OP)
Even that approach seems highly vulnerable to fair use. If the model does not recreate a copyrighted work with enough fidelity to be recognized as such, then how can it be said to be in violation of copyright?