My AI skeptic friends are all nuts

>>tablet+(OP)
I tried the agent thing on:

- Large C codebase (new feature and bugfix)

- Small rust codebase (new feature)

- Brand new greenfield frontend for an in-spec and documented openAPI API

- Small fixes to an existing frontend

It failed _dramatically_ in all cases. Maybe I'm using this thing wrong but it is devin-level fail. Gets diffs wrong. Passes phantom arguments to tools. Screws up basic features. Pulls in hundreds of line changes on unrelated files to refactor. Refactors again and again, over itself, partially, so that the uncompleted boneyard of an old refactor sits in the codebase like a skeleton (those tokens are also sent up to the model).

It genuinely makes an insane, horrible, spaghetti MESS of the codebase. Any codebase. I expected it to be good at svelte and solidJS since those are popular javascript frameworks with lots of training data. Nope, it's bad. This was a few days ago, Claude 4. Seriously, seriously people what am I missing here with this agents thing. They are such gluttonous eaters of tokens that I'm beginning to think these agent posts are paid advertising.

>>mlsu+ur
How are you writing your prompts? I usually break a feature down to smaller task level before I prompt an agent (claude code in my case) to do anything. Feature level is often too hard to prompt and specify in enough detail for it to get right.

So I'd say claude 4 agents today are at smart but fresh intern level of autonomy. You still have to do the high level planning and task break down, but it can execute on tasks (say requiring 10 - 200 lines of code excluding tests). Any asking it to write much more code (200+ lines) often require a lot of follow ups and disappointment.

>>chuckn+Hs
Coding agents should take you through a questionnaire before working. Break down what you are asking for into chunks, point me to key files that are important for this change, etc etc. I feel like a bit of extra prompting would help a lot of people get much better results rather than expecting people to know the arcane art of proompting just by looking at a chat input.

>>presen+cv
I am just a muggle, but I have been using Windsurf for months and this is the only way for me to end up with working code.

A significant portion of my prompts are writing and reading from .md files, which plan and document the progress.

When I start a new feature, it begins with: We need to add a new feature X that does ABC, create a .md in /docs to plan this feature. Ask me questions to help scope the feature.

I then manually edit the feature-x.md file, and only then tell the tool to implement it.

Also, after any major change, I say: Add this to docs/current_app_understanding.md.

Every single chat starts with: Read docs/current_app_understanding.md to get up to speed.

The really cool side benefit here is that I end up with solid docs, which I admittedly would have never created in the past.

zlacker