This doesn't make any sense. If it's a speech to speech transformer then 'training' could just be a sample at the beginning of the context window. Or it could one of several voices used for the Instruct-tuning or RLHF process. Either way, it doesn't debunk anything.