zlacker

[parent] [thread] 0 comments
1. famous+(OP)[view] [source] 2024-05-21 01:01:00
>One thing these trained voices make clear is that it's a tts engine generating ChatGPT-4o's speech, same as before. The whole omni-modal spin suggesting that the model is natively consuming and generating speech appears to be bunk.

This doesn't make any sense. If it's a speech to speech transformer then 'training' could just be a sample at the beginning of the context window. Or it could one of several voices used for the Instruct-tuning or RLHF process. Either way, it doesn't debunk anything.

[go to top]