The Codex App - zlacker

>>meetpa+(OP)
Somewhat underwhelmed. I consider agents to be a sidetrack. The key insight from the Recursive Language Models paper is that requirements, implementation plans, and other types of core information should not be part of context but exist as immutable objects that can be referenced as a source of truth. In practice this just means creating an .md file per stage (spec, analysis, implementation plan, implementation summary, verification and test plan, manual qa plan, global state reference doc).

I created this using PLANS.md and it basically replicates a kanban/scrum process with gated approvals per stage, locked artifacts when it moves to next stage, etc. It works very well and it doesnt need a UI. Sure, I could have several agents running at the same time, but I believe manual QA is key to keeping the codebase clean, so time spent on this today means that future requirements can be implemented 10x faster than with a messy codebase.

>>dworks+ql1
This is what I've been doing. Iterating on specs is better than iterating on code. More token efficient and easier to review. Good code effortlessly follows from good specs. It's also a good way to stop the code turning into quicksand (aside from constraining the code with e2e tests, CLI shape, etc).

But what is your concept of "stages"? For me, the spec files are a MECE decomposition, each file is responsible for its unique silo (one file owns repo layout, etc), with cross references between them if needed to eliminate redundancy. There's no hierarchy between them. But I'm open to new approaches.

>>energy+Zx1
The stages are modelled after a kanban board. So you can have whichever stages you think are important for your LLM development workflow. These are mine:

00: Iterate on requirements with ChatGPT outside of the IDE. Save as a markdown requirements doc in the repo

01: Inside the IDE; Analysis of current codebase based on the scope of the requirements

02: Based on 00 and 01, write the implementation plan. Implement the plan

03: Verification of implementation coverage and testing

04: Implementation summary

05: Manual QA based on generated doc

06: Update global STATE.md and DECISIONS.md that documents the app, and the what and why of every requirement

Every stage has a single .md as output and after the stage is finished the doc is locked. Every stage takes the previous stages' docs as input.

I have a half-finished draft with more details and a benchmark (need to re-run it since a missing dependency interrupted the runs)

https://dilemmaworks.com/implementing-recursive-language-mod...