[0] https://www.microsoft.com/en-us/research/wp-content/uploads/...
- familiarity with the accent and/or speaker;
- speed and style/cadence of the speech;
- any other audio that is happening that can muffle or distort the audio;
- etc.
It can also take multiple passes to get a decent transcription.