It's also just not as good at being self-directed and doing all of the rest of the agent-like behaviors we expect, i.e. breaking down into todolists, determining the appropriate scope of work to accomplish, proper tool calling, etc.
Codex is the best at following instructions IME. Claude is pretty good too but is a little more "creative" than codex at trying to re-interpret my prompt to get at what I "probably" meant rather than what I actually said.
It won't make any changes until a detailed plan is generated and approved.