zlacker

[return to "My AI skeptic friends are all nuts"]
1. mlsu+ur[view] [source] 2025-06-03 00:07:17
>>tablet+(OP)
I tried the agent thing on:

- Large C codebase (new feature and bugfix)

- Small rust codebase (new feature)

- Brand new greenfield frontend for an in-spec and documented openAPI API

- Small fixes to an existing frontend

It failed _dramatically_ in all cases. Maybe I'm using this thing wrong but it is devin-level fail. Gets diffs wrong. Passes phantom arguments to tools. Screws up basic features. Pulls in hundreds of line changes on unrelated files to refactor. Refactors again and again, over itself, partially, so that the uncompleted boneyard of an old refactor sits in the codebase like a skeleton (those tokens are also sent up to the model).

It genuinely makes an insane, horrible, spaghetti MESS of the codebase. Any codebase. I expected it to be good at svelte and solidJS since those are popular javascript frameworks with lots of training data. Nope, it's bad. This was a few days ago, Claude 4. Seriously, seriously people what am I missing here with this agents thing. They are such gluttonous eaters of tokens that I'm beginning to think these agent posts are paid advertising.

◧◩
2. turtle+5t[view] [source] 2025-06-03 00:21:25
>>mlsu+ur
Have it make small changes. Restrict it to a single file and scope it to <50 lines or so. Enough that you can easily digest without making it a chore.
◧◩◪
3. declan+qv[view] [source] 2025-06-03 00:42:48
>>turtle+5t
A small change scoped to <50 lines is something easy to write for a normal software engineer. When do the LLMs start doing the hard part?
◧◩◪◨
4. kasey_+nw[view] [source] 2025-06-03 00:51:49
>>declan+qv
When you wire them up to your cicd process with pull requests and the github gui as your interface, rather than sitting there passively riding along as it prompts you the changes it’s going to make.

With my async agent I do not care about how easy it is for me, it’s easier to tell the agent to do the workflow and comeback to it later when I’m ready to review it. If it’s a good change I approve the pr, if not I close it.

[go to top]