OpenAI didn’t copy Scarlett Johansson’s voice for ChatGPT, records show

>>richar+(OP)
When I first used ChatGPT's voice assistant's I was like "Wow, this one is clearly Scarlett Johansson from Her, they even copy her mannerisms."

No amount of unverifiable "records" (just pieces of paper provided by somebody who has a multimillion dollar incentive to show one outcome) will change my mind.

But if they can produce the actual voice artist I'd be more open-minded.

>>zug_zu+AJ1
Funny, I'm the opposite. I saw clips from the film after the controversy (it's been ten years since I saw the film itself) and Sky sounds nothing like Johansson to me. No amount of unverifiable "records".

>>stavro+LM1
1. The sky voice currently available in the app is a different model from the one they presented (one is pure TTS, the new one in GPT-4o is a proper multi modal model that can do speech in and out end to end)

2. Look at these images and tell me they didn't intend to replicate "Her": https://x.com/michalwols/status/1792709377528647995

>>m_ke+uN1
Which one are we saying sounds like Johansson? I'm talking about the TTS voice in the app, is everyone else talking about the multimodal voice from the 4o demos?

Also, whether they *intended* to replicate Her and whether they *did* in the end are very different.

>>stavro+0O1
This one: https://youtu.be/vgYi3Wr7v_g?feature=shared&t=22

compare it to: https://youtu.be/GV01B5kVsC0?feature=shared&t=125

>>m_ke+mO1
OK, I watched this expecting to be convinced.

I think they might have mimicked the style. The voice, though, is not even close. If I heard both voices in a conversation, I would have thought 2 different people were talking.

>>Beetle+bZ1
Truthfully, you can no longer trust yourself (whichever side you're on in this debate). We're all now primed and we'll pick up any distinguishing characteristics. You'd have to listen to them in a blind test and do this with several clips that do not reveal which ones are OpenAI and which are from a movie or something else that spoils it.

And I wouldn't put the metric at 50/50, needs to be indistinguishable. It would be a reasonable amount where it sounds __like__, which could be identifying the chatbot 100% of the time! (e.g. what if I just had a roboticized version of a person's voice) Truth is that I can send you clips of the same person[0], tell you they're different people, and a good portion of people will be certain that these are different people (maybe __you're different__™, but that doesn't matter).

So use that as the litmus test in either way. Not if you think they are different, but rather "would a reasonable person think this is supposed to sound like ScarJo?" Not you, other people. Then, ask yourself if there was sufficient evidence that OpenAI either purposefully intended to clone her voice OR got so set in their ways (maybe after she declined, but had hyped themselves up) that they would have tricked themselves into only accepting a voice actor that ended up sounding similar. That last part is important because it shows how such a thing can happen without ever explicitly (and maybe even not recognizing themselves) stating such a requirement. Remember that us humans do a lot of subconscious processing (I have a whole other rant on people building AGI -- a field I'm in fwiw -- not spending enough time understanding their minds or the minds of animals).

Edit:

[0]I should add that there's a robustness issue here and is going to be a distinguishing factor for people determining if the voices are different. Without a doubt, those voices are "different" but the question is in what way. The same way someone's voice might change day to day? The difference similar to how someone sounds on the phone vs in person? Certainly the audio quality is different and if you're expecting a 1-to-1 match where we can plot waveforms perfectly, then no, you wouldn't ever be able to do this. But that's not a fair test

zlacker