Also, the skill of the human opponents matters. There’s a difference between testing a chess bot against randomly selected college undergrads versus chess grandmasters.
Just like jailbreaks are not hard to find, figuring out exploits to get LLM’s to reveal themselves probably wouldn’t be that hard? But to even play the game at all, someone would need to train LLM’s that don’t immediately admit that they’re bots.