I created this using PLANS.md and it basically replicates a kanban/scrum process with gated approvals per stage, locked artifacts when it moves to next stage, etc. It works very well and it doesnt need a UI. Sure, I could have several agents running at the same time, but I believe manual QA is key to keeping the codebase clean, so time spent on this today means that future requirements can be implemented 10x faster than with a messy codebase.
If you add a dialectic between Opus 4.5 and GPT 5.2 (not the Codex variant), your workflow - which I use as well, albeit slightly differently [1] - may work even better.
This dialectic also has the happy side-effect of being fairly token efficient.
IME, Claude Code employs much better CLI tooling+sandboxing when implementing while GPT 5.2 does excellent multifaceted critique even in complex situations.
[1]
- spec requirement / iterate spec until dialectic is exhausted, then markdown
- plan / iterate plan until dialectic is exhausted, then markdown
- implement / curl-test + manual test / code review until dialectic is exhausted
- update previous repo context checkpoint (plus README.md and AGENTS.md) in markdown
i agree that CC seems like a better harness, but I think GPT is a better model. So I will keep it all inside the Codex VSCode plugin workflow.