> People coding with LLMs today use agents. Agents get to poke around your codebase on their own. They author files directly. They run tools. They compile code, run tests, and iterate on the results. ...
Is this what people are really doing? Who is just turning AI loose to modify things as it sees fit? If I'm not directing the work, how does it even know what to do?
I've been subjected to forced LLM integration from management, and there are no "Agents" anywhere that I've seen.
Is anyone here doing this that can explain it?
I think it's really hard to undersell how important agents are.
We have an intuition for LLMs as a function blob -> blob (really, token -> token, but whatever), and the limitations of such a function, ping-ponging around in its own state space, like a billion monkeys writing plays.
But you can also get go blob -> json, and json -> tool-call -> blob. The json->tool interaction isn't stochastic; it's simple systems code (the LLM could indeed screw up the JSON, since that process is stochastic --- but it doesn't matter, because the agent isn't stochastic and won't accept it, and the LLM will just do it over). The json->tool-call->blob process is entirely fixed system code --- and simple code, at that.
Doing this grounds the code generation process. It has a directed stochastic structure, and a closed loop.
What is an actual, real world example?
The agent code runs a regex that recognizes this prompt as a reference to a JIRA issue, and runs a small curl with predefined credentials to download the bug description.
It then assembles a larger text prompt such as "you will act as a master coder to understand and fix the following issue as faithfully as you can: {JIRA bug description inserted here}. You will do so in the context of the following code: {contents of 20 files retrieved from Github based on Metadata in the JIRA ticket}. Your answer must be in the format of a Git patch diff that can be applied to one of these files".
This prompt, with the JIRA bug description and code from your Github filled in, will get sent to some LLM chosen by some heuristic built into the agent - say it sends it to ChatGPT.
Then, the agent will parse the response from ChatGPT and try to parse it as a Git patch. If it respects git patch syntax, it will apply it to the Git repo, and run something like `make build test`. If that runs without errors, it will generate a PR in your Github and finally output the link to that PR for you to review.
If any of the steps fails, the agent will generate a new prompt for the LLM and try again, for some fixed number of iterations. It may also try a different LLM or try to generate various follow-ups to the LLM (say, it will send a new prompt in the same "conversation" like "compilation failed with the following issue: {output from make build}. Please fix this and generate a new patch."). If there is no success after some number of tries, it will give up and output error information.
You can imagine many complications to this workflow - the agent may interrogate the LLM for more intermediate steps, it may ask the LLM to generate test code or even to generate calls to other services that the agent will then execute with whatever credentials it has.
It's a byzantine concept with lots of jerry-rigging that apparently actually works for some use cases. To me it has always seemed far too much work to get started before finding out if there is any actual benefit for the codebases I work on, so I can't say I have any experience with how well these things work and how much they end up costing.