zlacker

[return to "Voxtral Transcribe 2"]
1. janals+Gy[view] [source] 2026-02-04 17:41:39
>>meetpa+(OP)
I noticed that this model is multilingual and understands 14 languages. For many use cases, we probably only need a single language, and the extra 13 are simply adding extra latency. I believe there will be a trend in the coming years of trimming the fat off of these jack of all trades models.

https://aclanthology.org/2025.findings-acl.87/

◧◩
2. popalc+4K[view] [source] 2026-02-04 18:27:54
>>janals+Gy
It doesn't make sense to have a language-restricted transcription model because of code switching. People aren't machines, we don't stick to our native languages without failure. Even monolingual people move in and out of their native language when using "borrowed" words/phrases. A single-language model will often fail to deal with that.
◧◩◪
3. janals+0N1[view] [source] 2026-02-04 23:48:21
>>popalc+4K
Everything is a tradeoff, and different use cases require different tradeoffs:

Option A: this model

Option B: faster model, only 1 language

Option C: same size model, only 1 language but higher quality

My point is that option A isn’t always best.

And on the borrowed words bit, there’s no rule that we cannot add borrowed words into the vocab. But you don’t need the whole language. I know what deja voux means but I don’t speak French.

◧◩◪◨
4. popalc+lk2[view] [source] 2026-02-05 04:28:00
>>janals+0N1
that depends entirely on how common the borrowed thing is. And anyway, option A is always going to be insufficient for my code-switching example -- as another commenter pointed out, it is very common to want to refer to a foreign work (song, movie, book) by its foreign language title. Monolingual ASR solutions break over this all the time. Try asking Alexa to play a Spanish language track on Spotify. It fails frequently.

The real world is like that.

[go to top]