zlacker

You could conceivably do this using the text to speech recognition on a Mac using Loopback to capture the stream to a microphone input.

https://rogueamoeba.com/loopback/

Someone clever enough could create containers to run the software locally and have many loops running off many streams to many instances of the audio to text feature.

replies(2): >>lunixb+f >>runawa+Jp

>>godzil+(OP)
Conventional speech recognition will not have very good accuracy for this sort of task out of the box. You will basically need to do local (as in, based on map location) language modeling, and should probably do custom acoustic modeling as well (training a neural network on what radio speech sounds like).

replies(1): >>crazyg+88

>>lunixb+f
Did you look at the transcripts? They do not have very good accuracy. A better solution probably should do those things; this one I'm assuming is not.

replies(1): >>lunixb+na

>>crazyg+88
https://news.ycombinator.com/newsguidelines.html

> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that."

Yes, I looked the website, and I also have my own website that does the same thing in a very similar manner. My comment was entirely in response to the suggestion Mac speech recognition be used for this. It should not be used for this. Based on previous experiments I have personally performed, it would be even worse than the website's accuracy. I then pointed out what a good solution might look like (and neither my website nor the linked website do the good solution yet)

replies(1): >>jcims+gg

>>lunixb+na
Assuming you tried Amazon Transcribe? How did it fare?

>>godzil+(OP)
Accuracy is a little wonky even with real speech to text toolkits like Kaldi (which I’ll mention is a pain to even get started with it).

I’ve had some decent results with the following:

https://cmusphinx.github.io/

I have to research how to hand tag my own samples to see if that offers significant accuracy improvements (let’s say I want to accurately transcribe one voice consistently).

Google and Watson APIs are not too free, and I believe Watson has a audio length limit (possibly limited by free tier, or possibly limited in general for all tiers).

Cool to see some real world attempts using this stuff.