It's not currently streaming any feeds because Google Speech is pretty expensive, but I have the expertise and plan to train my own speech models that would be less expensive to run and more accurate than Google at this task as well.
The main difference between this and murph is my `feeds` site has UI for users to listen to the audio and fix/vote on transcriptions, and corrections are propagated quickly to other users.
This is close enough to the Seattle feed that you can do a compare & contrast.
Heard: "clear my first call ocean nora 470" On the site: "charlie my first call"
So, yeah, this still has a long long way to go. I considered and discarded this in 2011 because it was pure insanity, and as another comment suggests, it's highly context-sensitive.
"ECR" is El Camino Real. "Vets" is Veterans Blvd.
But...
"Code 99" is the emergency button... for one department... and it means something else entirely for another, just 20 miles apart.
I'd love to have it, but it still seems out of reach.
My plan was to collect user transcription corrections on my site then train my own inexpensive models on them. The open-source speech tech I work on can do passable transcription at close to 100x faster than realtime on a quad core desktop CPU (or 200 simultaneous streams per 4-core box at 50% activity). With higher quality transcription it's closer to 10-20x faster than realtime.
For your case you could also try to push some of the computation down to the uploading machine. These models can run on a raspberry pi.
I think the biggest work for a new effort here is going to be building local language models and collecting transcribed audio to train on. However, there have been a couple of incredible advances in the last year for semi-supervised speech recognition learning, where we can probably leverage your 1 year backlog as "unsupervised training data" while only having a small portion of it properly transcribed.
The current state-of-the-art paper uses around 100 hours of transcribed audio and 60,000 hours of unlabeled audio, and I bet you could push the 100h requirement down with a good language model and mixing in existing training data from non-radio sources.
https://rogueamoeba.com/loopback/
Someone clever enough could create containers to run the software locally and have many loops running off many streams to many instances of the audio to text feature.
https://wiki.radioreference.com/index.php/Broadcastify-Calls
There's apparently some uncertainty around handling of encrypted emergency services communications: https://www.rtdna.org/content/scanners
https://www.fcc.gov/consumers/guides/interception-and-divulg...
> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that."
Yes, I looked the website, and I also have my own website that does the same thing in a very similar manner. My comment was entirely in response to the suggestion Mac speech recognition be used for this. It should not be used for this. Based on previous experiments I have personally performed, it would be even worse than the website's accuracy. I then pointed out what a good solution might look like (and neither my website nor the linked website do the good solution yet)
https://news.ycombinator.com/item?id=23322321
at 33 second mark https://twitter.com/jamescham/status/1265512829806927873
I’ve had some decent results with the following:
I have to research how to hand tag my own samples to see if that offers significant accuracy improvements (let’s say I want to accurately transcribe one voice consistently).
Google and Watson APIs are not too free, and I believe Watson has a audio length limit (possibly limited by free tier, or possibly limited in general for all tiers).
Cool to see some real world attempts using this stuff.