The Codex App - zlacker

>>meetpa+(OP)
Somewhat underwhelmed. I consider agents to be a sidetrack. The key insight from the Recursive Language Models paper is that requirements, implementation plans, and other types of core information should not be part of context but exist as immutable objects that can be referenced as a source of truth. In practice this just means creating an .md file per stage (spec, analysis, implementation plan, implementation summary, verification and test plan, manual qa plan, global state reference doc).

I created this using PLANS.md and it basically replicates a kanban/scrum process with gated approvals per stage, locked artifacts when it moves to next stage, etc. It works very well and it doesnt need a UI. Sure, I could have several agents running at the same time, but I believe manual QA is key to keeping the codebase clean, so time spent on this today means that future requirements can be implemented 10x faster than with a messy codebase.

>>dworks+ql1
Which paper?

>>boppo1+8F1
Recursive Language Models by Alex Zhang/MIT

>>dworks+IS1
@dworks: Good insights. Thanks!

If you add a dialectic between Opus 4.5 and GPT 5.2 (not the Codex variant), your workflow - which I use as well, albeit slightly differently [1] - may work even better.

This dialectic also has the happy side-effect of being fairly token efficient.

IME, Claude Code employs much better CLI tooling+sandboxing when implementing while GPT 5.2 does excellent multifaceted critique even in complex situations.

[1]

- spec requirement / iterate spec until dialectic is exhausted, then markdown

- plan / iterate plan until dialectic is exhausted, then markdown

- implement / curl-test + manual test / code review until dialectic is exhausted

- update previous repo context checkpoint (plus README.md and AGENTS.md) in markdown

>>varsha+0o2
adding another external model/agent is exactly what I have been planning as the next step. in fact i already paste the implementation and test summaries into chatgpt, and it is extremely helpful in hardening requirements, making them more extensible, or picking up gaps between the implementations and the initial specs. it would be very useful to have this in the workflow itself, rather than the coding agent reviewing its own work - there is a sense that it is getting tunnel visioned.

i agree that CC seems like a better harness, but I think GPT is a better model. So I will keep it all inside the Codex VSCode plugin workflow.