zlacker

[parent] [thread] 9 comments
1. MengYu+(OP)[view] [source] 2020-06-09 01:36:40
I don't have familiarity with speech-to-text but wouldn't it be possible to weight words based on their probability in this application to help resolve this. For example, suspect is probably a low frequency word in normal speech but very high in radio chatter.
replies(6): >>amluto+65 >>marakv+p5 >>haram_+B8 >>rasz+R8 >>abhgh+Tn >>watwut+zG
2. amluto+65[view] [source] 2020-06-09 02:33:28
>>MengYu+(OP)
This needs care. Imagine if you accidentally trained a model that added racism when none was present in the audio.
replies(1): >>VWWHFS+g9
3. marakv+p5[view] [source] 2020-06-09 02:35:58
>>MengYu+(OP)
[my side project is using speech-to-text so this is amateur hour response]

Yes, but it takes a bit to build up that weighted list and it can be quite hefty to parse. So they may be building this behind the scenes currently. As another commenter pointed out, being able to correct a chunk and send it back to help the algorithm would be a nice feature here.

Side note: I'm dealing with this issue at the moment - if anyone has a good resource on reducing the workload I'd love a link!.

Ed:spelling

4. haram_+B8[view] [source] 2020-06-09 03:13:21
>>MengYu+(OP)
You could actually borrow some techniques from text mining to do this, e.g. probabilistic latent semantic analysis, to constantly re-train your speech recognition model by reinforcing translations that semantically "make sense."
5. rasz+R8[view] [source] 2020-06-09 03:15:11
>>MengYu+(OP)
apple dictation does this, it will change already transcribed text mid sentence if it thinks something else fits better

https://news.ycombinator.com/item?id=23322321

at 33 second mark https://twitter.com/jamescham/status/1265512829806927873

replies(1): >>n4r9+2G
◧◩
6. VWWHFS+g9[view] [source] [discussion] 2020-06-09 03:18:40
>>amluto+65
Or used a weird model like this one which does some sort of Markov chaining to complete sentences that weren't even present in the original audio or transcription.

"foaming at the mouth" was never even close to being uttered on the radio. I'm guessing the (flawed) model inserted that part because of the proximity to the word "needle" and "assistance".

Maybe? No idea.. this website it totally fucked.

replies(1): >>optimu+ja
◧◩◪
7. optimu+ja[view] [source] [discussion] 2020-06-09 03:28:08
>>VWWHFS+g9
Hello!

The quality is currently limited by Google's API. I am working on getting some pre-trained models implemented, but voice processing is not my speciality as a software engineer.

I do NOT want to spread misinformation nor do we want to unjustly slander anyone. Tonight I will be adding a disclaimer mentioning the limitations of our service and will make sure it is forefront on the website.

Hopefully we can create a model which can deliver better results.

8. abhgh+Tn[view] [source] 2020-06-09 06:21:45
>>MengYu+(OP)
Traditional Automatic Speech Recognition (ASR) systems do this, and this component is known as a Language Model (LM).

Typically you would use/train a LM for your domain or specifically for your dataset.

◧◩
9. n4r9+2G[view] [source] [discussion] 2020-06-09 10:30:20
>>rasz+R8
That's slightly different to what OP was talking about, if I'm understanding correctly. You're talking about reassessing the probability of previous words based on future words. They're talking about weighting the prior probability of each word based on the context i.e. police conversation as opposed to normal phone conversation.
10. watwut+zG[view] [source] 2020-06-09 10:37:58
>>MengYu+(OP)
Imo, text to speech that produces apparent garbage is better then one who produces probably stuff that is wrong. Someone, either cop or citizen, could easily end up accused of wrongdoing where not actual wrongdoing happened.
[go to top]