I would also say that I believe that long-term goal oriented behavior isn't something that's well represented in the training data. We have stories about it, sometimes, but there's a need to map self-state to these stories to learn anything about what we should do next from them.
I feel like LLMs are much smarter than we are in thinking "per symbol", but we have facilities for iteration and metacognition and saving state that let us have an advantage. I think that we need to find clever, minimal ways to build these "looping" contexts.