Transcribed police scanners in real-time

>>illumi+(OP)
I built something similar to this, open-source, here: https://feeds.talonvoice.com

It's not currently streaming any feeds because Google Speech is pretty expensive, but I have the expertise and plan to train my own speech models that would be less expensive to run and more accurate than Google at this task as well.

The main difference between this and murph is my `feeds` site has UI for users to listen to the audio and fix/vote on transcriptions, and corrections are propagated quickly to other users.

>>illumi+(OP)
Example audio: https://www.broadcastify.com/webPlayer/29351

This is close enough to the Seattle feed that you can do a compare & contrast.

Heard: "clear my first call ocean nora 470" On the site: "charlie my first call"

So, yeah, this still has a long long way to go. I considered and discarded this in 2011 because it was pure insanity, and as another comment suggests, it's highly context-sensitive.

"ECR" is El Camino Real. "Vets" is Veterans Blvd.

But...

"Code 99" is the emergency button... for one department... and it means something else entirely for another, just 20 miles apart.

I'd love to have it, but it still seems out of reach.

>>blanto+G2
I prototyped this concept too, at https://feeds.talonvoice.com with prohibitively expensive Google speech recognition, but also have a feature for users to listen and fix transcriptions. If murph was anything like me they probably paid for broadcastify and tailed a couple of the static mp3 feeds.

My plan was to collect user transcription corrections on my site then train my own inexpensive models on them. The open-source speech tech I work on can do passable transcription at close to 100x faster than realtime on a quad core desktop CPU (or 200 simultaneous streams per 4-core box at 50% activity). With higher quality transcription it's closer to 10-20x faster than realtime.

For your case you could also try to push some of the computation down to the uploading machine. These models can run on a raspberry pi.

I think the biggest work for a new effort here is going to be building local language models and collecting transcribed audio to train on. However, there have been a couple of incredible advances in the last year for semi-supervised speech recognition learning, where we can probably leverage your 1 year backlog as "unsupervised training data" while only having a small portion of it properly transcribed.

The current state-of-the-art paper uses around 100 hours of transcribed audio and 60,000 hours of unlabeled audio, and I bet you could push the 100h requirement down with a good language model and mixing in existing training data from non-radio sources.

>>lunixb+V4
Our new project, Broadcastify Calls, might be a better fit for this. Instead of 24x7 live streams, we capture and ingest every individual call as a compressed audio file from SDRs (software defined receivers) We can then ingest and present back to consumers playback, rewind, playlist, of those calls. We're now capturing over 100 systems and 800-900 calls a minute... as we solidify the architecture it will be our new direction for how we capture and disseminate public safety audio (Police Scanners)

https://www.broadcastify.com/calls

>>blanto+G2
You could conceivably do this using the text to speech recognition on a Mac using Loopback to capture the stream to a microphone input.

https://rogueamoeba.com/loopback/

Someone clever enough could create containers to run the software locally and have many loops running off many streams to many instances of the audio to text feature.

>>p0sixl+n7
You can get started as a calls ingest provider here:

https://wiki.radioreference.com/index.php/Broadcastify-Calls

>>ataria+Z7
According to Broadcastify (https://www.broadcastify.com/), there's 43k listeners across 7000 streams right now

>>mhh__+Q8
(not a lawyer, if one shows up, listen to them) Generally speaking, you can listen to anything, as long as it's not for commercial gain. The exception to this is mobile phones, because it was trivial to listen to them using children's toy radios.

There's apparently some uncertainty around handling of encrypted emergency services communications: https://www.rtdna.org/content/scanners

https://www.fcc.gov/consumers/guides/interception-and-divulg...

>>illumi+(OP)
Very cool! Is this yours, OP? I built a text classifier platform (https://www.taggit.io/) and I'd love to explore building something on top of this text stream to notify people of important happenings around the protests in real time.

>>crazyg+4f
https://news.ycombinator.com/newsguidelines.html

> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that."

Yes, I looked the website, and I also have my own website that does the same thing in a very similar manner. My comment was entirely in response to the suggestion Mac speech recognition be used for this. It should not be used for this. Based on previous experiments I have personally performed, it would be even worse than the website's accuracy. I then pointed out what a good solution might look like (and neither my website nor the linked website do the good solution yet)

>>illumi+(OP)
Very useful mapped version in NYC https://scanmap.mobi/NY/

>>mhh__+Q8
Yes, it is legal in the US. Federal law allows it - P.R. 91-36 (FCC 93-410), though New York law attempts to make it illegal to do in your car: https://codes.findlaw.com/ny/vehicle-and-traffic-law/vat-sec...

>>MengYu+ki
apple dictation does this, it will change already transcribed text mid sentence if it thinks something else fits better

https://news.ycombinator.com/item?id=23322321

at 33 second mark https://twitter.com/jamescham/status/1265512829806927873

>>optimu+Jp
https://news.ycombinator.com/item?id=23322321 first comment might be of interest

>>godzil+W6
Accuracy is a little wonky even with real speech to text toolkits like Kaldi (which I’ll mention is a pain to even get started with it).

I’ve had some decent results with the following:

https://cmusphinx.github.io/

I have to research how to hand tag my own samples to see if that offers significant accuracy improvements (let’s say I want to accurately transcribe one voice consistently).

Google and Watson APIs are not too free, and I believe Watson has a audio length limit (possibly limited by free tier, or possibly limited in general for all tiers).

Cool to see some real world attempts using this stuff.

>>optimu+qq
Feel free to add something like this to some of the feeds from http://openmhz.com I can give you some pointers on how to pull down the streams of audio. They are already chopped into M4A files.

>>optimu+qu
It might be worth checking out https://www.assemblyai.com they let you build a custom audio model. One challenge with the audio from these radios is that it goes through some heavy compression. Traditional models will have a lot of challenges. Give a system that uses analog audio a try. The quality of the audio is a lot better.

>>p0sixl+n7
Hop on to https://gitter.im/trunk-recorder/Lobby if you are having trouble getting the https://github.com/robotastic/trunk-recorder software running. Trunk Recorder puts a wrapper around the OP25 software and lets you capture all of the audio from a radio system using an SDR.

>>optimu+qu
Maybe https://www.deepgram.com can help with a custom speech-to-text model?

>>optimu+qu
I have a generic English ASR model for ESPnet (https://github.com/espnet/espnet) trained on multiple various datasets and would be happy to provide it. If you send me few audio samples, I can give it a try. You can contact me pavel.denisov@gmx.de.

zlacker

Transcribed police scanners in real-time