If I encode a movie with H264 there is no way to get it to output "exactly what was in the training data" and I can argue that "like humans extract important information from large dumps of data, the algorithm does the same".
I don't have any reservations about calling an H264 encoded video redistributed with the wrong attribution "plagiarism", so I don't see what's different about Large X Models that they deserve a special pass.