That's great, but it's demonstrably false.
I can write code that calculates the average letter frequency across any Wikipedia article. I can't do that in my head without tools because of the rule of seven[1].
Tool use is absolutely an intelligence amplifier but it isn't the same thing.
> Because again, the actual “model” is just a text autocomplete engine and it generates from left to right.
This is technically true, but somewhat misleading. Humans speak "left to right" too. Specifically, LLMs do have some spatial reasoning ability (which is what you'd expect with RL training: otherwise they'd just predict the most popular token): https://snorkel.ai/blog/introducing-snorkelspatial/
[1] https://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus...