zlacker

[return to "Voxtral Transcribe 2"]
1. pietz+Dm[view] [source] 2026-02-04 16:47:16
>>meetpa+(OP)
Do we know if this is better than Nvidia Parakeet V3? That has been my go-to model locally and it's hard to imagine there's something even better.
◧◩
2. m1el+qK[view] [source] 2026-02-04 18:29:05
>>pietz+Dm
I've been using nemotron ASR with my own ported inference, and happy about it:

https://huggingface.co/nvidia/nemotron-speech-streaming-en-0...

https://github.com/m1el/nemotron-asr.cpp https://huggingface.co/m1el/nemotron-speech-streaming-0.6B-g...

◧◩◪
3. Multic+eW[view] [source] 2026-02-04 19:19:59
>>m1el+qK
I'm so amazed to find out just how close we are to the start trek voice computer.

I used to use Dragon Dictation to draft my first novel, had to learn a 'language' to tell the rudimentary engine how to recognize my speech.

And then I discovered [1] and have been using it for some basic speech recognition, amazed at what a local model can do.

But it can't transcribe any text until I finish recording a file, and then it starts work, so very slow batches in terms of feedback latency cycles.

And now you've posted this cool solution which streams audio chunks to a model in infinite small pieces, amazing, just amazing.

Now if only I can figure out how to contribute to Handy or similar to do that Speech To Text in a streaming mode, STT locally will be a solved problem for me.

[1] https://github.com/cjpais/Handy

◧◩◪◨
4. m1el+Eo1[view] [source] 2026-02-04 21:32:49
>>Multic+eW
you should check out

https://github.com/pipecat-ai/nemotron-january-2026/

discovered through this twitter post:

https://x.com/kwindla/status/2008601717987045382

◧◩◪◨⬒
5. kwindl+Lt1[view] [source] 2026-02-04 22:00:06
>>m1el+Eo1
Happy to answer questions about this (or work with people on further optimizing the open source inference code here). NVIDIA has more inference tooling coming, but it's also fun to hack on the PyTorch/etc stuff they've released so far.
[go to top]