zlacker

[parent] [thread] 8 comments
1. XCSme+(OP)[view] [source] 2026-02-04 17:43:21
Is it me or error rate of 3% is really high?

If you transcribe a minute of conversation, you'll have like 5 words transcribed wrongly. In an hour podcast, that is 300 wrongly transcribed words.

replies(1): >>cootsn+R
2. cootsn+R[view] [source] 2026-02-04 17:46:47
>>XCSme+(OP)
The error rate for human transcription can be as high as 5%.
replies(2): >>XCSme+H2 >>qingch+rc1
◧◩
3. XCSme+H2[view] [source] [discussion] 2026-02-04 17:53:03
>>cootsn+R
Oh wow, I thought humans are like 0.1% error rate, if they are native speakers and aware of the subject being discussed.
replies(3): >>zipy12+pj >>rhdunn+XG >>Nimitz+HM
◧◩◪
4. zipy12+pj[view] [source] [discussion] 2026-02-04 19:01:47
>>XCSme+H2
I was skepitcal upon hearing the figure but various sources do indeed back it up and [0] is a pretty interesting paper (old but still relevant human transcibers haven't changed in accuracy).

[0] https://www.microsoft.com/en-us/research/wp-content/uploads/...

replies(1): >>XCSme+Io
◧◩◪◨
5. XCSme+Io[view] [source] [discussion] 2026-02-04 19:28:00
>>zipy12+pj
I think it's actually hard to verify how correct a transcription is, at scale. Curious where those error rate numbers come from, because they should test it on people actually doing their job.
◧◩◪
6. rhdunn+XG[view] [source] [discussion] 2026-02-04 20:51:55
>>XCSme+H2
It can depend a lot on different factors like:

- familiarity with the accent and/or speaker;

- speed and style/cadence of the speech;

- any other audio that is happening that can muffle or distort the audio;

- etc.

It can also take multiple passes to get a decent transcription.

replies(1): >>qingch+Cc1
◧◩◪
7. Nimitz+HM[view] [source] [discussion] 2026-02-04 21:17:42
>>XCSme+H2
Most of these errors will not be meaningful. Real speech is full of ambiguities. 3% is low
◧◩
8. qingch+rc1[view] [source] [discussion] 2026-02-04 23:38:29
>>cootsn+R
I did transcription for a while in 2021. It is absurdly hard. Especially as these days humans only get the difficult jobs that AI has already taken a stab at.

The hardest one I did was for a sports network where it was a motorcross motorbike event where most of what you could hear was the roar of the bikes. There were two commentators I had to transcribe over the top of that mess and they were using the slang insider nicknames for all the riders, not their published names, so I had to sit and Google forums to find the names of the riders while I was listening. I'm not even sure how these local models would even be able to handle that insanity at all because they almost certainly lack enough domain knowledge.

◧◩◪◨
9. qingch+Cc1[view] [source] [discussion] 2026-02-04 23:39:57
>>rhdunn+XG
You missed a giant factor: domain knowledge. Transcribing something outside of your knowledge realm is very hard. I posted above about transcribing the commentary of a motorbike race where the commentators only used the slang names of the riders.
[go to top]