zlacker

[parent] [thread] 12 comments
1. pietz+(OP)[view] [source] 2026-02-04 16:47:16
Do we know if this is better than Nvidia Parakeet V3? That has been my go-to model locally and it's hard to imagine there's something even better.
replies(6): >>tylerg+83 >>czottm+nf >>whinvi+6h >>m1el+Nn >>moffka+4e1 >>d4rkp4+FM1
2. tylerg+83[view] [source] 2026-02-04 17:01:15
>>pietz+(OP)
I've been using Parakeet V3 locally and totally ancedotaly this feels more accurate but slightly slower
3. czottm+nf[view] [source] 2026-02-04 17:54:16
>>pietz+(OP)
I liked Parakeet v3 a lot until it started to drop whole sentences, willy-nilly.
replies(3): >>cypher+2d1 >>WXLCKN+7A1 >>d4rkp4+RM1
4. whinvi+6h[view] [source] 2026-02-04 18:01:27
>>pietz+(OP)
Came here to ask the same question!
5. m1el+Nn[view] [source] 2026-02-04 18:29:05
>>pietz+(OP)
I've been using nemotron ASR with my own ported inference, and happy about it:

https://huggingface.co/nvidia/nemotron-speech-streaming-en-0...

https://github.com/m1el/nemotron-asr.cpp https://huggingface.co/m1el/nemotron-speech-streaming-0.6B-g...

replies(1): >>Multic+Bz
◧◩
6. Multic+Bz[view] [source] [discussion] 2026-02-04 19:19:59
>>m1el+Nn
I'm so amazed to find out just how close we are to the start trek voice computer.

I used to use Dragon Dictation to draft my first novel, had to learn a 'language' to tell the rudimentary engine how to recognize my speech.

And then I discovered [1] and have been using it for some basic speech recognition, amazed at what a local model can do.

But it can't transcribe any text until I finish recording a file, and then it starts work, so very slow batches in terms of feedback latency cycles.

And now you've posted this cool solution which streams audio chunks to a model in infinite small pieces, amazing, just amazing.

Now if only I can figure out how to contribute to Handy or similar to do that Speech To Text in a streaming mode, STT locally will be a solved problem for me.

[1] https://github.com/cjpais/Handy

replies(1): >>m1el+121
◧◩◪
7. m1el+121[view] [source] [discussion] 2026-02-04 21:32:49
>>Multic+Bz
you should check out

https://github.com/pipecat-ai/nemotron-january-2026/

discovered through this twitter post:

https://x.com/kwindla/status/2008601717987045382

replies(1): >>kwindl+871
◧◩◪◨
8. kwindl+871[view] [source] [discussion] 2026-02-04 22:00:06
>>m1el+121
Happy to answer questions about this (or work with people on further optimizing the open source inference code here). NVIDIA has more inference tooling coming, but it's also fun to hack on the PyTorch/etc stuff they've released so far.
◧◩
9. cypher+2d1[view] [source] [discussion] 2026-02-04 22:29:14
>>czottm+nf
Yeah, I think the multilingual improvements in V3 caused some kind of regression for English - I've noticed large blocks occasionally dropped as well, so reverted to v2 for my usage. Specifically nvidia/parakeet-tdt-0.6b-v2 vs nvidia/parakeet-tdt-0.6b-v3
10. moffka+4e1[view] [source] 2026-02-04 22:34:22
>>pietz+(OP)
Parakeet is really good imo too, and it's just 0.6B so it can actually run on edge devices. 4B is massive, I don't see Voxtral running realtime on an Orin or fitting on a Hailo. An Orin Nano probably can't even load it at BF16.
◧◩
11. WXLCKN+7A1[view] [source] [discussion] 2026-02-05 00:56:54
>>czottm+nf
Oh god am I glad to read this. Thought it was my microphone or something.
12. d4rkp4+FM1[view] [source] 2026-02-05 02:44:08
>>pietz+(OP)
I’m curious about this too. On my M1 Max MacBook I use the Handy app on macOS with Parakeet V3 and I get near instant transcription, accuracy slightly less than slower Whisper models, but that drop is immaterial when talking to CLI coding agents, which is where I find the most use for this.

https://github.com/cjpais/Handy

◧◩
13. d4rkp4+RM1[view] [source] [discussion] 2026-02-05 02:46:28
>>czottm+nf
I didn’t see that but I do get a lot of stutters (words or syllables repeated 5+ times), not sure if it’s a model problem or post processing issue in the Handy app.
[go to top]