zlacker

[return to "Voxtral Transcribe 2"]
1. observ+4a[view] [source] 2026-02-04 15:53:39
>>meetpa+(OP)
Native diarization, this looks exciting. edit: or not, no diarization in real-time.

https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-26...

~9GB model.

◧◩
2. coder5+2f[view] [source] 2026-02-04 16:16:09
>>observ+4a
The diarization is on Voxtral Mini Transcribe V2, not Voxtral Mini 4B.
◧◩◪
3. sbroth+ho[view] [source] 2026-02-04 16:54:40
>>coder5+2f
Do you have experience with that model for diarization? Does it feel accurate, and what's its realtime factor on a typical GPU? Diarization has been the biggest thorn in my side for a long time..
◧◩◪◨
4. ashenk+VS[view] [source] 2026-02-04 19:03:41
>>sbroth+ho
You can test it yourself for free on https://console.mistral.ai/build/audio/speech-to-text I tried it on an english-speaking podcast episode, and apart from identying one host as two different speakers (but only once for a few sentences at the start), the rest was flawless from what I could see
[go to top]