zlacker

[parent] [thread] 5 comments
1. GaggiX+(OP)[view] [source] 2026-02-04 16:07:35
Gpt4o mini transcribe is better and actually realtime. Whisper is trained to encode the entire audio (or at least 30s chunks) and then decode it.
replies(2): >>emmett+s >>mdrzn+C
2. emmett+s[view] [source] 2026-02-04 16:09:51
>>GaggiX+(OP)
The linked article claims the average word error rate for Voxtral mini v2 is lower than GPT-4o mini transcribe
replies(1): >>GaggiX+N
3. mdrzn+C[view] [source] 2026-02-04 16:10:28
>>GaggiX+(OP)
So "gpt4o mini transcribe" is not just whisper v3 under the hood? Btw it's $0.006 / minute

For Whisper API online (with v3 large) I've found "$0.00125 per compute second" which is the cheapest absolute I've ever found.

replies(2): >>GaggiX+d1 >>breisa+8G
◧◩
4. GaggiX+N[view] [source] [discussion] 2026-02-04 16:11:11
>>emmett+s
Gpt4o mini transcribe is better than whisper, the context is the parent comment.
◧◩
5. GaggiX+d1[view] [source] [discussion] 2026-02-04 16:13:00
>>mdrzn+C
>So it's not just whisper v3 under the hood?

Why it should be Whisper v3? They even released an open model: https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-26...

◧◩
6. breisa+8G[view] [source] [discussion] 2026-02-04 19:05:21
>>mdrzn+C
Deepinfra offers Whisper V3 at 0.00045$ / minute of transcribed audio.
[go to top]