This is a long story. But, your take on this question is what the average person who responded to that survey knows. And it shows you really how little the results mean. Here are some minutia that really matter:
1. Even if you measure quality with people in the loop. What do you ask people? Here's a passage in English, one in French, do you agree? Rate it out of 10? Turns out people aren't calibrated at all to give reasonable ratings, you get basically junk results if you run this experiment.
2. You can ask people to do head to head experiments. Do you like translation A more than translation B? But.. what criteria should they use? Is accuracy what matters most? Is it how would they translate? Is it how well A or B reads? Is it how well it represents the form of the source? Or the ideas of the source?
3. Are we measuring sentences? Paragraphs? Pages? 3 word sentences "give me grool" are pretty easy. 3 page translations get tricky. Now you want to represent something about the style of the writer. Or to realize that they're holding something back. For example, it can be really obvious in French that I'm holding back someone's gender, but not obvious at all in English. What about customs? Taboos? Do we even measure 3 pages worth of translation in our NLP corpora? The respondents have no idea.
4. There are even domain-specific questions about translations. Do you know how to evaluate English to French in the context of a contract? One that goes from common law to civil law? No way. You need to translate ideas now, not just words. How about medical translation? Most translation work is highly technical like this.
I could go on. Mostly we don't even measure minutia about translations or domain-specific translation in our NLP benchmarks because the tools aren't good enough for that. Nor do we measure 5 page translations for their fidelity.
We actually mostly don't measure translations using humans at all! We collect translations from humans and then we compare machine translations to human translations after the fact, with something called parallel corpora (the historical example is the Hansard corpus; which is the proceedings of the Canadian parliament that are manually translated in English and French; the EU has also been a boon for translation research).
I'm scratching the surface here. Translation is a really complicated topic. My favourite book related to this is the Dictionary of Untranslatables https://press.princeton.edu/books/hardcover/9780691138701/di... Not something you'd read end-to-end but a really fun reference to dip into once in a while.
If someone who knows about these issues wants to say that there will be human-level translation AI in 10 years, ok, fine I'm willing to buy that. But if someone who is ignorant of all of this is trying to tell me that there will be human level AI for translation in 10 years, eh, they just don't know what they're talking about. I am by the way a translation visitor, I've published in the area, but I'm not an expert at all, I don't even trust my opinion on the subject of when it will be automated.
About biases, I saw appendix A and D.
Seniority doesn't mean >1000 citations. There are master's students with 1000 citations in junk journals who happened to get a paper in a better venue. Number of citations is not an indication of anything.
The way they count academia vs industry is meaningless. There are plenty of people who have an affiliation to a university but are primarily at a startup. There are plenty of people who are minor coauthors on a paper, or even faculty who are mostly interested in making money off of the AI hype. There are plenty of people who graduated 3 years ago, this is a wrap-up of their work, they counted as academic in the survey, but now they're in industry. etc.