I think the paper should've included controls, because we don't know how strong the result is. They certainly may have proven that humans can't reason either.
Some people will use any limitation of LLMs to deny there is anything to see here, while others will call this ‘moving the goalposts’, but the most interesting questions, I believe, involve figuring out what the differences are, putting aside the question of whether LLMs are or are not AGIs.