zlacker

[parent] [thread] 16 comments
1. autojo+(OP)[view] [source] 2020-06-02 21:32:11
Is there a text transcript feature for users who may want to search through the communications? I'm curious how well those speech-to-text tools work for the audio feeds.
replies(4): >>mstade+71 >>lunixb+V3 >>panda8+B5 >>lunixb+7j
2. mstade+71[view] [source] 2020-06-02 21:38:34
>>autojo+(OP)
I suppose maybe you can try running it through Otter[1]. We’ve used it with varying degrees of success when interviewing customers, and it’s also what powers Zoom’s transcript feature which is what we use these days.

It’s hit and miss, in my opinion. It’ll give you a good enough base to refine the transcript from, but I’ve yet to come across a transcript that doesn’t need editing. (Which is annoying, since Zoom doesn’t give you that option.) I’d say it’s more valuable having the tool than not, but don’t expect miracles.

(I’m not affiliated with Otter or Zoom in any way.)

[1]: https://otter.ai/login

3. lunixb+V3[view] [source] 2020-06-02 21:53:45
>>autojo+(OP)
Hi, this is a difficult problem but I've been working hard on it for a couple of days with some help. I have a pipeline and website that automatically transcribes scanner feeds that is working pretty well, and the website allows users to correct and vote on transcriptions.

My goal is to train my own models on the corrected transcriptions (I work in the speech recognition space) so I can transcribe many live feeds inexpensively.

I will respond with a link here (hopefully very soon today) once I've fixed a couple of remaining UX bugs.

replies(1): >>ciaran+v4
◧◩
4. ciaran+v4[view] [source] [discussion] 2020-06-02 21:56:19
>>lunixb+V3
I thought there were some open source speech-to-text models already [1].

Maybe there's something unique about how these low-quality radio transmissions sound that make these ineffective?

[1] https://voice.mozilla.org/en

replies(1): >>lunixb+G4
◧◩◪
5. lunixb+G4[view] [source] [discussion] 2020-06-02 21:57:38
>>ciaran+v4
I work in the speech recognition space and train my own models already. The existing open-source models aren't very good at noisy radio speech. I will specialize one of my models to this task once I have some data from the site.
replies(2): >>ciaran+o5 >>jcims+Z8
◧◩◪◨
6. ciaran+o5[view] [source] [discussion] 2020-06-02 22:01:59
>>lunixb+G4
Got it, thanks. Good luck!
7. panda8+B5[view] [source] 2020-06-02 22:03:28
>>autojo+(OP)
Could we run the audio stream through YouTube livestream and enable caption?
◧◩◪◨
8. jcims+Z8[view] [source] [discussion] 2020-06-02 22:22:54
>>lunixb+G4
As you’re well aware but HN folks may not be, it’s not just that it’s noisy, it’s heavily coded, contextually bankrupt speech between multiple parties that spend all day in contact with each other. Dispatchers in particular seem to have superhuman ability to extract information from completely unintelligible garbage.

Are you doing any kind of speaker identification?

replies(1): >>blanto+Ke
◧◩◪◨⬒
9. blanto+Ke[view] [source] [discussion] 2020-06-02 22:54:19
>>jcims+Z8
This is a very accurate description of the problem space. Every municipality has their own jargon, vernacular, and ways to communicate brevity which is key in public safety communications. The communications are often digitized over vocoders that are less than optimal, and then you have the process of recovering voice from noisy communcations channels.

This is definitely a very hard problem to solve.

replies(1): >>jcims+eg
◧◩◪◨⬒⬓
10. jcims+eg[view] [source] [discussion] 2020-06-02 23:04:03
>>blanto+Ke
Indeed. The only reason I know is that I tried a few years back and realized that I was asking the computer to do something that I couldn't even do. Anyone that doubts it, just listen to the NYPD feed and try to transcribe for just a minute or two.

https://www.broadcastify.com/listen/feed/32890

(edit: also, thank you for keeping this service up and running for so long, have been a regular user since the early RR days. Would love to have a comment/live chat option if your backlog is getting bare :))

replies(1): >>lunixb+Ri
◧◩◪◨⬒⬓⬔
11. lunixb+Ri[view] [source] [discussion] 2020-06-02 23:23:17
>>jcims+eg
Ok, here we go: https://feeds.talonvoice.com

Repo is here if you need to report (or just fix :D) bugs in the webapp: https://github.com/lunixbochs/feeds

replies(1): >>jcims+8j
12. lunixb+7j[view] [source] 2020-06-02 23:25:03
>>autojo+(OP)
It's a hard problem. I'm prototyping this here [1]. Any user can tweak or vote on transcriptions, so my goal is to use the user annotations to help train models and make it better.

[1] https://feeds.talonvoice.com

Repo is here if you need to report (or fix) bugs in the webapp: https://github.com/lunixbochs/feeds

If you want to help with development, reach out and I can onboard + give some test data.

replies(1): >>dspoka+Uw
◧◩◪◨⬒⬓⬔⧯
13. jcims+8j[view] [source] [discussion] 2020-06-02 23:25:08
>>lunixb+Ri
Whoa this is awesome! Love the option to fix a transcription, should hopefully help with training if you get some traction.
replies(1): >>lunixb+hj
◧◩◪◨⬒⬓⬔⧯▣
14. lunixb+hj[view] [source] [discussion] 2020-06-02 23:27:13
>>jcims+8j
Thanks! I did a lot of tests and no existing ASR I found could do it to 100%, so I'm using the best ASR I could find and hoping users will help with transcriptions if they want to see it succeed and scale.
◧◩
15. dspoka+Uw[view] [source] [discussion] 2020-06-03 01:15:22
>>lunixb+7j
Great to see you working on this!

I was wondering if you could estimate what it would cost to have always on recording of all these radio conversations, cost of running this speech2text ML and cost of labeling this data.

I think having these rough estimates will make donations easier for people.

replies(2): >>lunixb+8D >>imroot+A61
◧◩◪
16. lunixb+8D[view] [source] [discussion] 2020-06-03 02:19:12
>>dspoka+Uw
Great question! Unfortunately the long term costs aren't clear yet, right now I'm using google speech as a bootstrapping technique, but that is prohibitively expensive to run long term.

I think once my models are viable enough to do this at scale, the cost will be basically the cost of running a dedicated server per N streams. So $100-300/mo per N streams? Where N could roughly be at least 100 concurrent streams per server. I will know this better in "stage 2" where I'm attempting to scale this up. It's also a fairly distributed problem so I can look into doing it folding@home style, or even have the stream's originator running transcription in some cases to keep costs down.

◧◩◪
17. imroot+A61[view] [source] [discussion] 2020-06-03 07:18:02
>>dspoka+Uw
I've got a year+ of the Ohio MARCS-IP site in Hamilton County Ohio recorded. Let me know if you need some data -- I'd be more than happy to get you the dump.

(trunk-recorder + rdio scanner).

The UI is:

https://cvgscan.iwdo.xyz for the live stuff, but, let me know if you're interested in the data -- my email is in my profile

[go to top]