To me this still feels like the wrong way to interact with a coding agent. Does this lead people to success? I've never seen it not go off the rails in some way unless you provide clear boundaries as to what the scope of the expected change is. It's gonna write code if you don't even want it to yet, it's gonna write the test first or the logic first, whichever you don't want it to do. It'll be much too verbose or much too hacky, etc.
First phase: Plan. Mandatory to complete, as well as get AI feedback from a separate context or model. Iterate until complete.
Only then move on to the Second Phase: make edits.
Better planning == Better execution
With Codex, I increasingly can skip the plan step, and it just toils along until it has finished the issue. It can be more "lazy" at times and ask before going ahead more often, but usually in a reasonable scope (and sometimes at points where I think other services would have gone ahead on a wrong tangent and burnt more tokens of their more limited usage).
I wouldn't be surprised that with the next 1-2 model iterations a plan step won't be worth the effort anymore, given a good enough initial written issue.