zlacker

[parent] [thread] 30 comments
1. m_ke+(OP)[view] [source] 2024-05-23 14:09:00
1. The sky voice currently available in the app is a different model from the one they presented (one is pure TTS, the new one in GPT-4o is a proper multi modal model that can do speech in and out end to end)

2. Look at these images and tell me they didn't intend to replicate "Her": https://x.com/michalwols/status/1792709377528647995

replies(2): >>stavro+w >>lairv+Xe
2. stavro+w[view] [source] 2024-05-23 14:12:03
>>m_ke+(OP)
Which one are we saying sounds like Johansson? I'm talking about the TTS voice in the app, is everyone else talking about the multimodal voice from the 4o demos?

Also, whether they *intended* to replicate Her and whether they *did* in the end are very different.

replies(1): >>m_ke+S
◧◩
3. m_ke+S[view] [source] [discussion] 2024-05-23 14:13:41
>>stavro+w
This one: https://youtu.be/vgYi3Wr7v_g?feature=shared&t=22

compare it to: https://youtu.be/GV01B5kVsC0?feature=shared&t=125

replies(9): >>zaat+s2 >>maroon+K2 >>skg_RS+f9 >>Factol+h9 >>dilap+wa >>Beetle+Hb >>GaggiX+md >>dylan6+hg >>jasonl+PE1
◧◩◪
4. zaat+s2[view] [source] [discussion] 2024-05-23 14:22:45
>>m_ke+S
Well, I thought it will be similar, but at least with how sky voice sounds through the phone speakers, I can hardly find any resemblance.
◧◩◪
5. maroon+K2[view] [source] [discussion] 2024-05-23 14:24:43
>>m_ke+S
I keep reading in the media that Sky was introduced as part of ChatGPT-4o, but that's incorrect. Sky's been around since they introduced the mobile iOS app.

While Sky's voice shares similar traits to SJ, it sounds different enough that I was never confused as to whether it was actually SJ or not.

replies(1): >>fnordp+s8
◧◩◪◨
6. fnordp+s8[view] [source] [discussion] 2024-05-23 14:53:26
>>maroon+K2
I don’t think you understand. 4o introduces a new multimodal Sky replacing the old one. They have only released clips of the new voices. It’s never been in the iOS app. The one you refer to is the old voice model. If you listen to the linked video above it’s very obviously not the same voice (I use Sky on iOS btw)

To be honest the new sky is obnoxious and overly emotive. I’m not trying to flirt with my phone.

replies(1): >>maroon+wH1
◧◩◪
7. skg_RS+f9[view] [source] [discussion] 2024-05-23 14:56:49
>>m_ke+S
I am of two minds here, regardless of the "closeness" there is a whole field of comedy that does impressions of others. That is what is so difficult about the AI discussion. Clearly, there are plenty of humans who can mimic other humans visually, in prose for writing, in voice and mannerisms etc.

Leaving the IP issue aside, they could clearly have hired a voice actor to closely resemble Johansson maybe without additional tweaks to the voice in post processing. If they did do that, I am not totally sure what position to take on the matter

replies(1): >>Freeby+Wb
◧◩◪
8. Factol+h9[view] [source] [discussion] 2024-05-23 14:56:58
>>m_ke+S
Those don't sound anything alike, except being two female voices. Sky is clearly a bit lower and with a lot more vocal fry.
◧◩◪
9. dilap+wa[view] [source] [discussion] 2024-05-23 15:03:17
>>m_ke+S
Thank you for providing a nice side-by-side. This makes it clear to me the voices are not very similar at all. If Johansson had agreed, I have to imagine they would've been able to make a much closer (and less annoying!) voice.
replies(1): >>dilyev+fE
◧◩◪
10. Beetle+Hb[view] [source] [discussion] 2024-05-23 15:10:18
>>m_ke+S
OK, I watched this expecting to be convinced.

I think they might have mimicked the style. The voice, though, is not even close. If I heard both voices in a conversation, I would have thought 2 different people were talking.

replies(3): >>barrel+Al >>causal+cA >>godels+sI
◧◩◪◨
11. Freeby+Wb[view] [source] [discussion] 2024-05-23 15:12:12
>>skg_RS+f9
The important thing is that they never said it was Johansson. They were not pretending to be her. They are not imitating her likeness whatsoever.
replies(1): >>flumpc+Jk
◧◩◪
12. GaggiX+md[view] [source] [discussion] 2024-05-23 15:18:02
>>m_ke+S
Are you using this as an argument about how similar they are? The voice sounds distinctly different, no problem discerning between the two.
13. lairv+Xe[view] [source] 2024-05-23 15:26:46
>>m_ke+(OP)
Genuine question, what's wrong with trying to replicate in real life an idea from a SciFi movie ?

I understand that it could be problematic if OpenAI did one of two things:

- imitated Scarlett Johansson's voice to impersonate her

- misled people into believing that GPT-4o is an official by-product of the film Her, like calling it “the official Her AI”

The first point is still unclear, and that's precisely the point of the article

For the second point, the tweets you posted clearly show that the AI from Her served as an inspiration for creating the GPT-4o model, but not a trademark infringement

Will Matt Damon receive royalties if a guy is ever stuck on Mars ?

replies(3): >>jeroje+5i >>m_ke+wi >>qarl+aj
◧◩◪
14. dylan6+hg[view] [source] [discussion] 2024-05-23 15:32:24
>>m_ke+S
Holy Crappyness Batman! The OpenAI clip is so bad. Homeboy keeps stepping on "her" lines. So from this I come away with he's just a rude asshat that doesn't know how to socially interact with people, she's just too damn chatty and doesn't know when to shut up, or maybe it was just really bad editing? Either way, it's not an intriguing promo to me in the least.
replies(1): >>onemor+dT
◧◩
15. jeroje+5i[view] [source] [discussion] 2024-05-23 15:39:24
>>lairv+Xe
Pretty sure the CEO of OpenAI tweeted "Her." after the reveal of the voice.

Isn't that a suggestion that what they're doing is similar to "the Her AI"?

replies(2): >>z7+Lp >>lairv+7y
◧◩
16. m_ke+wi[view] [source] [discussion] 2024-05-23 15:41:27
>>lairv+Xe
Imagine if Facebook came to you and wanted an exclusive license to white label whatever you work on, then after you rejected them they went and copied most of your code but changed the hue or saturation of some of the colors and shipped it to all of their customers (There's definitely hours of Scarlet Johanssons talking in the dataset that GPT4o was trained on).

Would that be ethical?

EDIT: or even better, imagine how OpenAI would react if some company trained their own model by distilling from GPT4 outputs and then launched a product with it called “ChatGPC”. (They already go after products that have GPT in their name)

replies(2): >>gs17+yO >>immibi+xY1
◧◩
17. qarl+aj[view] [source] [discussion] 2024-05-23 15:43:57
>>lairv+Xe
> Genuine question, what's wrong with trying to replicate in real life an idea from a SciFi movie ?

The thing is, there are several cases where a jury found this exact thing to warrant damages.

But honestly, that is irrelevant. The situation here is that OpenAI is facing a TON of criticism for running roughshod over intellectual property rights. They are claiming that we should trust them, they are trying to do the right thing.

But in this case, they're dancing on the edge of right and wrong.

I don't mind when a sleazy company makes "MacDougals" to sell hamburgers. But it's not something to be proud of. And it's definitely not a company that I'd trust.

◧◩◪◨⬒
18. flumpc+Jk[view] [source] [discussion] 2024-05-23 15:50:58
>>Freeby+Wb
Some employees were definitely thinking of Scarlett Johansson, even ignoring the reference to the film "Her":

https://x.com/karpathy/status/1790373216537502106

replies(1): >>GaggiX+ss
◧◩◪◨
19. barrel+Al[view] [source] [discussion] 2024-05-23 15:54:27
>>Beetle+Hb
Without commenting on the debate at large, it’s a bit funny to read this comment.

I mean voice cloning a year or two ago was basically science fiction, now we’re talking about voices being distinguishable as proof it’s not cloned, sourced, or based on someone.

FWIW I also thought it was supposed to be the her/sj voice for a long time, until I heard them side by side. Not sure where to stand on the issue, so I’m glad I’m on the sidelines :)

◧◩◪
20. z7+Lp[view] [source] [discussion] 2024-05-23 16:13:36
>>jeroje+5i
Yes, the unprecedented conversational functionality of the GPT-4o demo could be compared to the AI in the movie. Why assume that the tweet was about the voice sounding like Scarlett Johansson?
◧◩◪◨⬒⬓
21. GaggiX+ss[view] [source] [discussion] 2024-05-23 16:28:24
>>flumpc+Jk
Karpathy doesn't work for OpenAI anymore tho.
◧◩◪
22. lairv+7y[view] [source] [discussion] 2024-05-23 16:55:56
>>jeroje+5i
It's a suggestion that they were inspired by the movie, not that they are releasing a product under the "Her" trademark

It's a movie, not a patent on women voice AI assistants

◧◩◪◨
23. causal+cA[view] [source] [discussion] 2024-05-23 17:07:55
>>Beetle+Hb
I agree they don't sound the same. But, since it's a subjective test, OpenAI was pretty Twitter-foolish to push the "Her" angle after being explicitly rejected by SJ. It's just inviting controversy.
◧◩◪◨
24. dilyev+fE[view] [source] [discussion] 2024-05-23 17:26:38
>>dilap+wa
The cadence and speed in Her is much too fast for any mass customer product
◧◩◪◨
25. godels+sI[view] [source] [discussion] 2024-05-23 17:48:02
>>Beetle+Hb
Truthfully, you can no longer trust yourself (whichever side you're on in this debate). We're all now primed and we'll pick up any distinguishing characteristics. You'd have to listen to them in a blind test and do this with several clips that do not reveal which ones are OpenAI and which are from a movie or something else that spoils it.

And I wouldn't put the metric at 50/50, needs to be indistinguishable. It would be a reasonable amount where it sounds __like__, which could be identifying the chatbot 100% of the time! (e.g. what if I just had a roboticized version of a person's voice) Truth is that I can send you clips of the same person[0], tell you they're different people, and a good portion of people will be certain that these are different people (maybe __you're different__™, but that doesn't matter).

So use that as the litmus test in either way. Not if you think they are different, but rather "would a reasonable person think this is supposed to sound like ScarJo?" Not you, other people. Then, ask yourself if there was sufficient evidence that OpenAI either purposefully intended to clone her voice OR got so set in their ways (maybe after she declined, but had hyped themselves up) that they would have tricked themselves into only accepting a voice actor that ended up sounding similar. That last part is important because it shows how such a thing can happen without ever explicitly (and maybe even not recognizing themselves) stating such a requirement. Remember that us humans do a lot of subconscious processing (I have a whole other rant on people building AGI -- a field I'm in fwiw -- not spending enough time understanding their minds or the minds of animals).

Edit:

[0]I should add that there's a robustness issue here and is going to be a distinguishing factor for people determining if the voices are different. Without a doubt, those voices are "different" but the question is in what way. The same way someone's voice might change day to day? The difference similar to how someone sounds on the phone vs in person? Certainly the audio quality is different and if you're expecting a 1-to-1 match where we can plot waveforms perfectly, then no, you wouldn't ever be able to do this. But that's not a fair test

◧◩◪
26. gs17+yO[view] [source] [discussion] 2024-05-23 18:25:47
>>m_ke+wi
> then after you rejected them

The article shows the timeline would make this them already licensing a similar product to your more famous one, then you saying no, and them continuing to use the existing similar one.

> But while many hear an eerie resemblance between “Sky” and Johansson’s “Her” character, an actress was hired to create the Sky voice months before Altman contacted Johansson, according to documents, recordings, casting directors and the actress’s agent.

◧◩◪◨
27. onemor+dT[view] [source] [discussion] 2024-05-23 18:48:53
>>dylan6+hg
I think the whole thing was scripted beforehand and approved by Sam Altman, of course.
replies(1): >>dylan6+L91
◧◩◪◨⬒
28. dylan6+L91[view] [source] [discussion] 2024-05-23 20:18:59
>>onemor+dT
That doesn't really make it better because now a) it was a horrible script, b) the fact they didn't try to clean up the audio from "her" with anything more than a fade. If you told me this was just some intern making a video, then maybe, but now you've told me it was scripted just oh so makes it worse to me.
◧◩◪
29. jasonl+PE1[view] [source] [discussion] 2024-05-23 23:54:52
>>m_ke+S
The OpenAI one is recording the audio from a phone, where as the movie version is into a mic directly. They will sound different, but there are elements that are the same. Anyone using these to compare though and saying they don't hear the difference isn't comparing apples to apples.

However, the fact that there is a debate at all proves there should be more of an investigation done.

◧◩◪◨⬒
30. maroon+wH1[view] [source] [discussion] 2024-05-24 00:15:41
>>fnordp+s8
I've listened to the clips and yes, while 4o Sky is more emotive, it's just that - a more emotive Sky. All the elements that people are pointing to - the husky/raspiness - were present in the pre-4o Sky.
◧◩◪
31. immibi+xY1[view] [source] [discussion] 2024-05-24 03:33:34
>>m_ke+wi
Facebook does do this, and Google, and Microsoft, and Apple. I believe they call it "Getting Sherlocked."
[go to top]