zlacker

This is very impressive.

I'm the owner of Broadcastify.com, where presumably these streams are being transcribed from. We've dabbled in this space and looked at real-world approaches to taking something like this to market, but transcribing 7000+ streams to text seems like an expensive (computational) and ($$) effort that needs a lot of investigation.

Note to mention that the individual lexicons between streams are drastically different.

I wonder how the developer has done the integration to our streams... because I never heard from them :)

replies(8): >>robbie+r1 >>lunixb+f2 >>godzil+g4 >>artemi+C4 >>ciaran+Xa >>bertmu+Wc >>optimu+Kn >>coffee+yA1

>>blanto+(OP)
Hey Lindsay, I'm the one who just added EBRCS to Calls. Wondering if a solution to both cost problems would be to (optionally) have submitters upload a transcript along with each call? Could build a model into trunk-recorder maybe?

replies(1): >>blanto+h2

>>blanto+(OP)
I prototyped this concept too, at https://feeds.talonvoice.com with prohibitively expensive Google speech recognition, but also have a feature for users to listen and fix transcriptions. If murph was anything like me they probably paid for broadcastify and tailed a couple of the static mp3 feeds.

My plan was to collect user transcription corrections on my site then train my own inexpensive models on them. The open-source speech tech I work on can do passable transcription at close to 100x faster than realtime on a quad core desktop CPU (or 200 simultaneous streams per 4-core box at 50% activity). With higher quality transcription it's closer to 10-20x faster than realtime.

For your case you could also try to push some of the computation down to the uploading machine. These models can run on a raspberry pi.

I think the biggest work for a new effort here is going to be building local language models and collecting transcribed audio to train on. However, there have been a couple of incredible advances in the last year for semi-supervised speech recognition learning, where we can probably leverage your 1 year backlog as "unsupervised training data" while only having a small portion of it properly transcribed.

The current state-of-the-art paper uses around 100 hours of transcribed audio and 60,000 hours of unlabeled audio, and I bet you could push the 100h requirement down with a good language model and mixing in existing training data from non-radio sources.

replies(4): >>blanto+p3 >>jcims+18 >>optimu+4p >>runawa+zv

>>robbie+r1
That might be an option, but we'd have to somehow get trained models down to the client, which is one of the issues.

We're working on client ingest models now that work on more of a "tasking" perspective, where someone deploys a device that is GPS enabled and then we send an ingest task to fill in coverage, start new coverage, etc. But this is predicated on low cost ingest devices (read: RPi and RTL sticks) which might not have the horsepower needed for transcription at the client level.

replies(1): >>johann+Zp1

>>lunixb+f2
Our new project, Broadcastify Calls, might be a better fit for this. Instead of 24x7 live streams, we capture and ingest every individual call as a compressed audio file from SDRs (software defined receivers) We can then ingest and present back to consumers playback, rewind, playlist, of those calls. We're now capturing over 100 systems and 800-900 calls a minute... as we solidify the architecture it will be our new direction for how we capture and disseminate public safety audio (Police Scanners)

https://www.broadcastify.com/calls

replies(3): >>lunixb+Y3 >>p0sixl+H4 >>jcims+89

>>blanto+p3
The source repo to feeds.talonvoice.com includes a test ingestor that scrapes your calls API and uploads the src/dst info with the transcription, I haven't tested it live though.

>>blanto+(OP)
You could conceivably do this using the text to speech recognition on a Mac using Loopback to capture the stream to a microphone input.

https://rogueamoeba.com/loopback/

Someone clever enough could create containers to run the software locally and have many loops running off many streams to many instances of the audio to text feature.

replies(2): >>lunixb+v4 >>runawa+Zt

>>godzil+g4
Conventional speech recognition will not have very good accuracy for this sort of task out of the box. You will basically need to do local (as in, based on map location) language modeling, and should probably do custom acoustic modeling as well (training a neural network on what radio speech sounds like).

replies(1): >>crazyg+oc

>>blanto+(OP)
Not sure how easy/difficult this would be from am implementation perspective, but perhaps transcription could be provided as a pay-as-you-go service. Ie, users could pay into a pool for live transcription of a stream for a certain period of time or to retroactively transcribe specific streams.

I'd have to imagine that stream listening follows some sort of power (or otherwise 80/20) law, so hopefully that would help with the expense?

>>blanto+p3
Hey Blatoni, big fan, and software engineer here. Any way you could add Rochester, NY (Monroe County Sheriff, and RPD) to the list of supported calls? I have an RTL SDR, but haven't been able to spend the time figuring out how to decrypt the Phase II trunking.

replies(2): >>blanto+m5 >>robota+v11

>>p0sixl+H4
You can get started as a calls ingest provider here:

https://wiki.radioreference.com/index.php/Broadcastify-Calls

>>lunixb+f2
Not to be 'that guy' but I vastly prefer your implementation. Having both the audio and transcription is almost mandatory to something like this (unless I'm an idiot and missing the ability to play the call on this).

I wonder if one could mix in openstreetmap data for a location to help pick up local references. (Eventually would be cool to round trip it with a little ping when addresses/businesses are referenced).

replies(1): >>lunixb+ff

>>blanto+p3
Love the idea! P25 decoder seems like it needs a little tuning...can you share what you're using?

Any thoughts on adding the ability to comment/transcribe/etc?

>>blanto+(OP)
Not exactly related to this specific post, but do you worry about the slow transition to encrypted comms that emergency departments are making?

>>lunixb+v4
Did you look at the transcripts? They do not have very good accuracy. A better solution probably should do those things; this one I'm assuming is not.

replies(1): >>lunixb+De

>>blanto+(OP)
Have you taken a look at the output?

I selected the NYC scanner and found many examples like this:

June 8, 2020 9:03 PM EDT: "Google Launcher new job and I want to play better third-party colder or does the people from the vegetable okay"

>>crazyg+oc
https://news.ycombinator.com/newsguidelines.html

> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that."

Yes, I looked the website, and I also have my own website that does the same thing in a very similar manner. My comment was entirely in response to the suggestion Mac speech recognition be used for this. It should not be used for this. Based on previous experiments I have personally performed, it would be even worse than the website's accuracy. I then pointed out what a good solution might look like (and neither my website nor the linked website do the good solution yet)

replies(1): >>jcims+wk

>>jcims+18
Yes, I think local language modeling would be crucial to doing this correctly

>>lunixb+De
Assuming you tried Amazon Transcribe? How did it fare?

>>blanto+(OP)
Thank you for the kind words. My team and I simply want to help people and create accountability/transparency.

We decided to ask for forgiveness on this project - we do in fact have a premium account with you guys! ffmpeg records and segments your streams, sox removes silence from scanners, Google transcribes, redis serves it all up quickly!

A broadcastify.com premium account is not not a great excuse, but I would love to have a conversation at length with you and your team! How do we get in contact? Thanks again!

replies(1): >>robota+C01

>>lunixb+f2
Hi lunixbochs!

Your prototype is amazing! The quality of transcription is definitely better than ours via Google.

After we did some legal research we wanted to avoid storing the recordings and rather solely transcription text. Giving access to a platform for humans to verify the transcriptions and in turn train the model is a great idea.

I have started working on getting some pre-trained models set up. I am trying to implement them with wav2letter, deepspeech, kaldi, vosk, etc. - I just need to be pointed in the right direction.

Raspberry Pi's were something I was considering as well - small energy footprint and powerful enough to run these models.

Do you have any advice on ML or acoustic models to avoid? I am working with the 100 hour dataset now.

Thanks!

replies(1): >>johann+hp1

>>godzil+g4
Accuracy is a little wonky even with real speech to text toolkits like Kaldi (which I’ll mention is a pain to even get started with it).

I’ve had some decent results with the following:

https://cmusphinx.github.io/

I have to research how to hand tag my own samples to see if that offers significant accuracy improvements (let’s say I want to accurately transcribe one voice consistently).

Google and Watson APIs are not too free, and I believe Watson has a audio length limit (possibly limited by free tier, or possibly limited in general for all tiers).

Cool to see some real world attempts using this stuff.

>>lunixb+f2
I’d love to read a write up on this if you ever feel the urge.

>>optimu+Kn
Feel free to add something like this to some of the feeds from http://openmhz.com I can give you some pointers on how to pull down the streams of audio. They are already chopped into M4A files.

>>p0sixl+H4
Hop on to https://gitter.im/trunk-recorder/Lobby if you are having trouble getting the https://github.com/robotastic/trunk-recorder software running. Trunk Recorder puts a wrapper around the OP25 software and lets you capture all of the audio from a radio system using an SDR.

>>optimu+4p
I have the same setup as Broadcastify Calls (trunkrecorder) and a site built to play each audio recording then allow the user to provide what they heard. I used it to train some public safety specific models on Kaldi and Sphinx.

I have 30ish streams and keep 6 days worth, I could keep longer if you'd like to work together on this. I reached out to some of the people above, the Broadcastify guy for example, and they are, as mentioned, ready doing their own thing so didn't really care about what I wanted to share.

replies(1): >>robota+pS2

>>blanto+h2
That would be cool, kind of like GroundStation or SatNOGS

>>blanto+(OP)
Hey I want to say that I love Broadcastify for listening to my local police and fire scanners so thank you for providing that free service without requiring app downloads and so on. Awesome work!

I will say though I recently noticed that during the recent unrest in my city that the stream had a caption along the lines of 'police are using encrypted channels' which is understandable but disappointing from the perspective of a citizen looking for transparency.

I realize both the possible legal and technical difficulty of implementing something like this but have you had any conversations about how to maybe combat this? Times of unrest are not only a massive opportunity for Broadcastify to grow its user base but it's also when transparency is at its peak importance.

>>johann+hp1
This sounds awesome - If you have any documentation up on how to do this, I would love to point to it from the trunk-recorder wiki.