zlacker

[return to "My AI skeptic friends are all nuts"]
1. retrac+J[view] [source] 2025-06-02 21:16:59
>>tablet+(OP)
Machine translation and speech recognition. The state of the art for these is a multi-modal language model. I'm hearing impaired veering on deaf, and I use this technology all day every day. I wanted to watch an old TV series from the 1980s. There are no subtitles available. So I fed the show into a language model (Whisper) and now I have passable subtitles that allow me to watch the show.

Am I the only one who remembers when that was the stuff of science fiction? It was not so long ago an open question if machines would ever be able to transcribe speech in a useful way. How quickly we become numb to the magic.

◧◩
2. Beetle+p5[view] [source] 2025-06-02 21:41:34
>>retrac+J
> Machine translation and speech recognition.

Yes, yes and yes!

I tried speech recognition many times over the years (Dragon, etc). Initially they all were "Wow!", but they simply were not good enough to use. 95% accuracy is not good enough.

Now I use Whisper to record my voice, and have it get passed to an LLM for cleanup. The LLM contribution is what finally made this feasible.

It's not perfect. I still have to correct things. But only about a tenth of the time I used to. When I'm transcribing notes for myself, I'm at the point I don't even bother verifying the output. Small errors are OK for my own notes.

◧◩◪
3. n8cpdx+px[view] [source] 2025-06-03 01:02:21
>>Beetle+p5
Have they solved the problem of Whisper making up plausible sounding junk (e.g. such that reading it you would have no idea it was completely hallucinated) when there is any silence or pause in the audio?
[go to top]