I strongly believe that human language is too weak (vague, inconsistent, not expressive enough etc.) to replace interactions with the world as a basis to build strong cognition.
We're easily fooled by the results of LLM/LRM models because we typically use language fluency and knowledge retrieval as a proxy benchmark for intelligence among our peers.
I also wonder about the compounding effects of luck and survivorship bias when using these systems. If you model a series of interactions with these systems probabilistically, as a series of failure/success modes, then you are bound to get a sub-population of users (of LLM/LLRMs) that will undoubtedly have “fantastic” results. This sub-population will then espouse and promote the merits of the system. There is clearly something positive these models do, but how much of the “success” is just luck.