zlacker

The speech-to-text transcription is so incredibly wrong that it's almost dangerous to publish it like this.

For instance:

> at the beach view new screen for assistance there is a needle in his hand he's foaming from his mouth throwing off this item

What the officer actually said on the radio:

> He was going to Rainier Beach area. A request for assistance to approach two people with needles. <operator>: Call was from a neighbor in the area.

replies(6): >>MengYu+b2 >>ghostp+B2 >>foobie+O2 >>lysp+n6 >>asciim+y7 >>runawa+Jh

>>VWWHFS+(OP)
I don't have familiarity with speech-to-text but wouldn't it be possible to weight words based on their probability in this application to help resolve this. For example, suspect is probably a low frequency word in normal speech but very high in radio chatter.

replies(6): >>amluto+h7 >>marakv+A7 >>haram_+Ma >>rasz+2b >>abhgh+4q >>watwut+KI

>>VWWHFS+(OP)
It would be cool if there was a way to listen for a bit and feed some corrected transcriptions back in to help train the algorithm better

replies(1): >>Cthulh+Jw

>>VWWHFS+(OP)
if it wasn't so serious, some of them would be are sort of entertaining.

>>VWWHFS+(OP)
> I know up until injuries this was in France but when I press your boobs

> Jun 9, 2020 12:17 PM

Sounds like it's getting them very wrong.

>>MengYu+b2
This needs care. Imagine if you accidentally trained a model that added racism when none was present in the audio.

replies(1): >>VWWHFS+rb

>>VWWHFS+(OP)
* For entertainment purposes only.

>>MengYu+b2
[my side project is using speech-to-text so this is amateur hour response]

Yes, but it takes a bit to build up that weighted list and it can be quite hefty to parse. So they may be building this behind the scenes currently. As another commenter pointed out, being able to correct a chunk and send it back to help the algorithm would be a nice feature here.

Side note: I'm dealing with this issue at the moment - if anyone has a good resource on reducing the workload I'd love a link!.

Ed:spelling

>>MengYu+b2
You could actually borrow some techniques from text mining to do this, e.g. probabilistic latent semantic analysis, to constantly re-train your speech recognition model by reinforcing translations that semantically "make sense."

>>MengYu+b2
apple dictation does this, it will change already transcribed text mid sentence if it thinks something else fits better

https://news.ycombinator.com/item?id=23322321

at 33 second mark https://twitter.com/jamescham/status/1265512829806927873

replies(1): >>n4r9+dI

>>amluto+h7
Or used a weird model like this one which does some sort of Markov chaining to complete sentences that weren't even present in the original audio or transcription.

"foaming at the mouth" was never even close to being uttered on the radio. I'm guessing the (flawed) model inserted that part because of the proximity to the word "needle" and "assistance".

Maybe? No idea.. this website it totally fucked.

replies(1): >>optimu+uc

>>VWWHFS+rb
Hello!

The quality is currently limited by Google's API. I am working on getting some pre-trained models implemented, but voice processing is not my speciality as a software engineer.

I do NOT want to spread misinformation nor do we want to unjustly slander anyone. Tonight I will be adding a disclaimer mentioning the limitations of our service and will make sure it is forefront on the website.

Hopefully we can create a model which can deliver better results.

>>VWWHFS+(OP)
Well, when you come realize the training data for a lot of this is often someone reading the Washington Post corpus or something similar, it kind of makes sense why accuracy kind of sucks.

>>MengYu+b2
Traditional Automatic Speech Recognition (ASR) systems do this, and this component is known as a Language Model (LM).

Typically you would use/train a LM for your domain or specifically for your dataset.

>>ghostp+B2
Yeah, live transcripting is one of those jobs that can be easily done remotely, at scale, and crowdsourced; if one sentence is transcripted by two or three people, you can do error correction / checking as well (or just show the different interpretations). Transcripts for radio comms are important because they can be used in legal proceedings. Same with e.g. bodycam footage.

replies(1): >>watwut+rM

>>rasz+2b
That's slightly different to what OP was talking about, if I'm understanding correctly. You're talking about reassessing the probability of previous words based on future words. They're talking about weighting the prior probability of each word based on the context i.e. police conversation as opposed to normal phone conversation.

>>MengYu+b2
Imo, text to speech that produces apparent garbage is better then one who produces probably stuff that is wrong. Someone, either cop or citizen, could easily end up accused of wrongdoing where not actual wrongdoing happened.

>>Cthulh+Jw
You cant use crowd sourced transcript for legal proceedings if original audio was not recorded. It would be too easy to be poisoned by bad actors doing intentionally bad transcripts.

replies(1): >>hammoc+s51

>>watwut+rM
I guess it would be considered hearsay at that point