How you work without calculators is a proxy for real world competency.
Trying to solve everything with CoT alone without utilising tools seems futile.
Heh, we really did come full circle on this! When chatgpt launched in dec22 one of the first things that people noticed is that it sucked at math. Like basic math 12 + 35 would trip it up. Then people "discovered" tool use, and added a calculator. And everyone was like "well, that's cheating, of course it can use a calculator, but look it can't do the simple addition logic"... And now here we are :)
And as a poker player, I can say that this game is much more challenging for computers than chess, writing a program that can play poker really well and efficiently is an unsolved problem.
Chess engines don’t grow on trees, they’re built by intelligent systems that can think, namely human brains.
Supposedly we want to build machines that can also think, not just regurgitate things created by human brains. That’s why testing CoT is important.
It’s not actually about chess, it’s about thinking and intelligence.
Maybe we should just get rid of tedious benchmarks like chess altogether at this point that is leading people to think of how to limit AI as a way of keeping it a relevant benchmark rather than expanding on what is already there.
It doesn't even need to be one tool but a series of tools.