If you aren't building up mental models of the problem as you go, you end up in a situation where the LLM gets stuck at the edges of its capability, and you have no idea how even to help it overcome the hurdle. Then you spend hours backtracking through what it's done building up the mental model you need, before you can move on. The process is slower and more frustrating than not using AI in the first place.
I guess the reality is, your luck with AI-assisted coding really comes down to the problem you're working on, and how much of it is prior art the LLM has seen in training.
If it helps, for context: I'll go round and round with an agent until I've got roughly what I want, and then I go through and beat everything into my own idiom. I don't push code I don't understand and most of the code gets moved or reworked a bit. I don't expect good structure from LLMs (but I also don't invest the time to improve structure until I've done a bunch of edit/compile/test cycles).
I think of LLMs mostly as a way of unsticking and overcoming inertia (and writing tests). "Writing code", once I'm in flow, has always been pleasant and fast; the LLMs just get me to that state much faster.
I'm sure training data matters, but I think static typing and language tooling matters much more. By way of example: I routinely use LLMs to extend intensely domain-specific code internal to our project.
With a web-based system you need repomix or something similar to give the whole project (or parts of it if you can be bothered to filter) as context, which isn't exactly nifty
Inconsistency and crap code quality aren't solved yet, and these make the agent workflow worse because the human only gets to nudge the AI in the right direction very late in the game. The alternative, interactive, non-agentic workflows allow for more AI-hand-holding early, and better code quality, IMO.
Agents are fine if no human is going to work on the (sub)system going forward, and you only care about the shiny exterior without opening the hood to witness the horrors within.
I have definitely not seen this in my experience (with Aider, Claude and Gemini). While helping me debug an issue, Gemini added a !/bin/sh line to the middle of the file (which appeared to break things), and despite having that code in the context didn't realise it was the issue.
OTOH, when asking for debugging advice in a chat window, I tend to get more useful answers, as opposed to a half-baked implementation that breaks other things. YMMV, as always.
Regardless, Gemini 2.5 Pro is far far better and I use that with open-source free Roo Code. You can use the Gemini 2.5 Pro experimental model for free (rate limited) to get a completely free experience and taste for it.
Cursor was great and started is off, but others took notice and now they're all more or less the same. It comes down to UX and preference, but I think Windsurf and Roo Code just did a better job here than Cursor, personally.