Advancing AI Benchmarking with Game Arena

>>salkah+(OP)
If AI can program, why does it matter if it can play Chess using CoT when it can program a Chess Engine instead? This applies to other domains as well.

>>10xDev+td
It can write a chess engine because it has read the code of a thousand of chess engines. This benchmark measures a different aspect of intelligence.

And as a poker player, I can say that this game is much more challenging for computers than chess, writing a program that can play poker really well and efficiently is an unsolved problem.

>>Rivier+OE
The most popular form was solved in 2019: https://en.wikipedia.org/wiki/Pluribus_(poker_bot)

>>marksi+OI1
Pluribus didn't solve poker. It's limited to fixed starting stack sizes. It can't exploit weak opponents, it tries to approach a Nash equilibrium, but in multiplayer poker, Nash equilibrium doesn't have the theoretical guarantees it does in head's up. And lastly, it requires a ton of compute.

zlacker