Voxtral Transcribe 2

>>meetpa+(OP)
I noticed that this model is multilingual and understands 14 languages. For many use cases, we probably only need a single language, and the extra 13 are simply adding extra latency. I believe there will be a trend in the coming years of trimming the fat off of these jack of all trades models.

https://aclanthology.org/2025.findings-acl.87/

>>janals+Gy
honestly the inability to correctly transcribe the 4 language mix i use in my everyday life is one of the major blockers for adopting ASR tech in my own tooling. this coming from someone who literally works in that field.

turns out, outside the US, many people speak more than one language. :)

edit: I should say was a major blocker, because the last iterations of open-weight models actually work better and better. it's often the UX that's not thought for these usecases.

zlacker