https://www.metaculus.com/questions/3479/date-weakly-general...
Like for example the discovery that language models get far better at answering complex questions if asked to show their working step by step with chain of thought reasoning as in page 19 of the PaLM paper [1]. Worth checking out the explanations of novel jokes on page 38 of the same paper. While it is, like you say, all statistics, if it's indistinguishable from valid reasoning, then perhaps it doesn't matter.