zlacker

[parent] [thread] 2 comments
1. nabaki+(OP)[view] [source] 2024-05-21 03:20:32
Azure Speech tts is capable of speeding up, slowing down, sarcasm, etc with SSML. I wouldn't be surprised if it's what OpenAI is using on the backend.
replies(1): >>vessen+r
2. vessen+r[view] [source] 2024-05-21 03:25:48
>>nabaki+(OP)
Greg has specifically said it's not an SSML-parsing text model; he's said it's an end to end multimodal model.

FWIW, I would find it very surprising if you could get the low latency expressiveness, singing, harmonizing, sarcasm and interpretation of incoming voice through SSML -- that would be a couple orders of magnitude better than any SSML product I've seen.

replies(1): >>nabaki+6e3
◧◩
3. nabaki+6e3[view] [source] [discussion] 2024-05-22 00:49:56
>>vessen+r
Not sure about the low latency aspect, but I've seen everything else you mentioned with SSML. Also, I can't find where Greg said that, could you point me to it?
[go to top]