There only needs to be an answer if it's determined that some number isn't copyright infringement. The easy answer would be to say that the process is what prevents the works from being transformative(and thus copyrightable) and not the size of the training set.