zlacker

Figure 5 is really quite remarkable. It seems to show that normal LLMs are better at tasks where the correct answer is likely to be the next token. For tasks that require a small number of intermediate steps, current reasoning models do much better, but break down as the number of intermediate steps grow.

This seems to indicate that the next generation of models should focus on recursively solving small parts of the problem before function-calling another model to solve another small part of the problem and working it's answer into the reasoning loop.

Many seem to be citing this paper as an indication that LLMs are over - I think this indicates a clear path towards the next step function change in their abilities.