Statement from Scarlett Johansson on the OpenAI "Sky" voice

>>mjcl+(OP)
I found the whole ChatGPT-4o demo to be cringe inducing. The fact that Altman was explicitly, and desperately, trying to copy "her" at least makes it understandable why he didn't veto the bimbo persona - it's actually what he wanted. Great call by Scarlett Johansson in not wanting to be any part of it.

One thing these trained voices make clear is that it's a tts engine generating ChatGPT-4o's speech, same as before. The whole omni-modal spin suggesting that the model is natively consuming and generating speech appears to be bunk.

>>HarHar+mg
> One thing these trained voices make clear is that it's a tts engine generating ChatGPT-4o's speech, same as before.

I'm not familiar with the specifics of how AI models work but doesn't the ability from some of the demos rule out what you've said above? Eg. The speeding up and slowing down speech and the sarcasm don't seem possible if TTS was a separate component

>>monroe+gl
I have no special insight into what they're actually doing, but speeding up and slowing down speech have been features of SSML for a long time. If they are generating a similar markup language it's not inconceivable that it would be possible to do what you're describing.

>>mmcwil+wo
It's also possible that any such enunciation is being hallucinated from the text by the speech model.

AI models exist to make up bullshit that fills a gap. When you have a conversation with any LLM it's merely autocompleting the next few lines of what it thinks is a movie script.

zlacker