In the single city case he had volunteers check the text-to-speech outputs and manually fill in addresses that were missed. His accuracy rate was quite low with 2014 tools so it was a lot of manual work to transcribe the addresses from the recordings. I suppose text-to-speech tools are better since then, but these recordings are still quite dirty. You're talking analog radio recordings of poorly trained personnel who have regional accents, mumbles, inconsistent phrasing, etc.
Try to find tooling to get good text-to-speech accuracy from some sample source like the recordings on liveatc.net (air traffic controllers) and see how accurate your results are, noting the difference between controllers (who are trained in proper phraseology and speaking techniques) and pilots (who are not).