zlacker

[return to "Voxtral Transcribe 2"]
1. simonw+kg[view] [source] 2026-02-04 16:21:17
>>meetpa+(OP)
This demo is really impressive: https://huggingface.co/spaces/mistralai/Voxtral-Mini-Realtim...

Don't be confused if it says "no microphone", the moment you click the record button it will request browser permission and then start working.

I spoke fast and dropped in some jargon and it got it all right - I said this and it transcribed it exactly right, WebAssembly spelling included:

> Can you tell me about RSS and Atom and the role of CSP headers in browser security, especially if you're using WebAssembly?

◧◩
2. Oras+qh[view] [source] 2026-02-04 16:25:22
>>simonw+kg
Thank you for the link! Their playground in Mistral does not have a microphone. it just uploads files, which does not demonstrate the speed and accuracy, but the link you shared does.

I tried speaking in 2 languages at once, and it picked it up correctly. Truly impressive for real-time.

◧◩◪
3. Tactic+EG1[view] [source] 2026-02-04 23:09:14
>>Oras+qh
> Truly impressive for real-time.

Impressive indeed. Works way better than the speech recognition I first got demo'ed in... 1998? I remember you had to "click" on the mic everytime you wanted to speak and, well, not only the transcription was bad, it was so bad that it'd try to interpret the sound of the click as a word.

It was so bad I told several people not to invest in what was back then a national tech darling:

https://en.wikipedia.org/wiki/Lernout_%26_Hauspie

That turned out to be a massive fraud.

But ...

> I tried speaking in 2 languages at once, and it picked it up correctly.

I'm a native french speaker and I tried with a very simple sentence mixing french and english:

"Pour un pistolet je prefere un red dot mais pour une carabine je prefere un ACOG" (aka "For a pistol I prefer a red dot but for a carbine I prefer an ACOG")

And instead I got this:

"Je prépare un redote, mais pour une carabine, je préfère un ACOG."

"Je prépare un redote ..." doesn't mean anything and it's not at all what I said.

I like it, it's impressive, but literally the first sentence I tried it got the first half entirely wrong.

◧◩◪◨
4. jnaina+K02[view] [source] 2026-02-05 01:33:10
>>Tactic+EG1
I used sell the Mac Voice Navigator (from Articulate Systems) in the 90s, which was a SCSI based hardware box that you plug into a Mac, Mac SE or Mac II. It used to use the same L&H speech recognition tech (if I recall correctly) and was called the "User Interface" of the future.

Horrible speech recognition rate and very glitchy. Customers hated it, and lots of returns/complaints.

A few years later, L&H went bankrupt. And so did Articulate Systems.

https://applerescueofdenver.com/products-page/macintosh-to-p...

[go to top]