Is it better? Worse? Why do they only compare to gpt4o mini transcribe?
For Whisper API online (with v3 large) I've found "$0.00125 per compute second" which is the cheapest absolute I've ever found.
The thing that makes it particularly misleading is that models that do transcription to lowercase and then use inverse text normalization to restore structure and grammar end up making a very different class of mistakes than Whisper, which goes directly to final form text including punctuation and quotes and tone.
But nonetheless, they're claiming such a lower error rate than Whisper that it's almost not in the same bucket.
There's a reason that quite a lot of good transcribers still use V2, not V3.
Why it should be Whisper v3? They even released an open model: https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-26...