i.e. the tell that it's not human is that it is too perfectly human.
However if we could transport people from 2012 to today to run the test on them, none would guess the LLM output was from a computer.
Also, the skill of the human opponents matters. There’s a difference between testing a chess bot against randomly selected college undergrads versus chess grandmasters.
Just like jailbreaks are not hard to find, figuring out exploits to get LLM’s to reveal themselves probably wouldn’t be that hard? But to even play the game at all, someone would need to train LLM’s that don’t immediately admit that they’re bots.