zlacker

[return to "My AI skeptic friends are all nuts"]
1. retrac+J[view] [source] 2025-06-02 21:16:59
>>tablet+(OP)
Machine translation and speech recognition. The state of the art for these is a multi-modal language model. I'm hearing impaired veering on deaf, and I use this technology all day every day. I wanted to watch an old TV series from the 1980s. There are no subtitles available. So I fed the show into a language model (Whisper) and now I have passable subtitles that allow me to watch the show.

Am I the only one who remembers when that was the stuff of science fiction? It was not so long ago an open question if machines would ever be able to transcribe speech in a useful way. How quickly we become numb to the magic.

◧◩
2. albert+R4[view] [source] 2025-06-02 21:39:10
>>retrac+J
That's not quite true. State of the art both in speech recognition and translation is still a dedicated model only for this task alone. Although the gap is getting smaller and smaller, and it also heavily depends on who invests how much training budget.

For example, for automatic speech recognition (ASR), see: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

The current best ASR model has 600M params (tiny compared to LLMs, and way faster than any LLM: 3386.02 RTFx vs 62.12 RTFx, much cheaper) and was trained on 120,000h of speech. In comparison, the next best speech LLM (quite close in WER, but slightly worse) has 5.6B params and was trained on 5T tokens, 2.3M speech hours. It has been always like this: With a fraction of the cost, you will get a pure ASR model which still beats every speech LLM.

The same is true for translation models, at least when you have enough training data, so for popular translation pairs.

However, LLMs are obviously more powerful in what they can do despite just speech recognition or translation.

◧◩◪
3. edflsa+s6[view] [source] 2025-06-02 21:47:18
>>albert+R4
What translation models are better than LLMs?

The problem with Google-Translate-type models is the interface is completely wrong. Translation is not sentence->translation, it's (sentence,context)->translation (or even (sentence,context)->(translation,commentary)). You absolutely have to be able to input contextual information, instructions about how certain terms are to be translated, etc. This is trivial with an LLM.

◧◩◪◨
4. thatjo+Fa[view] [source] 2025-06-02 22:14:07
>>edflsa+s6
This is true, and LLMs crush Google in many translation tasks, but they do too many other things. They can and do go off script, especially if they "object" to the content being translated.

"As a safe AI language model, I refuse to translate this" is not a valid translation of "spierdalaj".

◧◩◪◨⬒
5. selfho+mb[view] [source] 2025-06-02 22:18:51
>>thatjo+Fa
That's literally an issue with the tool being made defective by design by the manufacturer. Not with the tool-category itself.
◧◩◪◨⬒⬓
6. Aachen+5o[view] [source] 2025-06-02 23:38:35
>>selfho+mb
Was thinking the same about the censoring, but going off-script? Have you seen DeepL or similar tools invent things?
◧◩◪◨⬒⬓⬔
7. thatjo+xj1[view] [source] 2025-06-03 09:40:32
>>Aachen+5o
I've seen people use ChatGPT to translate for them, and seen it embellish texts with its typical obsessions, like "combining" and "engagement".
[go to top]