In the case of computer-generated voices, there are qualities that are desirable that also happen to be attributes of a real person's voice. How many of these desirable attributes can a computer-generated voice have before it's considered too close to the set of attributes a particular person's voice has?
Real question.
Is there some waveform comparison that a court would accept?
The cases brought forth by Marvin Gaye's family [1] showed that some judges will declare copyright infringement even if the melody, harmony and rhythm are different. Note that the author saying he reverse-engineered the original song in question probably had something to do with it, so in the end intent and artistic perception will always remain factors that no computer function can compute.
[1] https://en.m.wikipedia.org/wiki/Pharrell_Williams_v._Bridgep...