This is a thoughtful article. Thanks `tptacek
My LLM use is: 1 - tedious stuff; web pages interacting with domain back end. 2 - domain discovery.
In a recent adventure, I used Claude 4 to tease out parameters in a large graph schema. This is a combination of tedium and domain discovery (it's not my graph and I'm not a domain expert). In the first day, Claude uncovered attributes and relations no other LLM or Google search uncovered. And it worked!! The next day, I allowed it to continue. After a bit, results didn't pass the sniff test.
I checked into details of Claude's thinking: it decided to start making up schema attributes and inventing fallback queries on error with more made up attributes. It was "conscious" of its decision to do so. By the time I caught this, Claude had polluted quite a bit of code. Sure, plenty of well placed git commits helped in rolling back code...but it's not quite that simple..over the many git commits were sprinkled plenty of learnings I don't want to toss. It took another two days of carefully going through the code to pull out the good stuff and then roll things back. So now I'm at day five of this adventure with cleaned up code and notes on what we learned.
I suspect continual improvements on tooling will help. Until then, it's a short leash.
If LLMs couldn't do anything else then that alone would still warrant an invention of a century sticker.
With the help of the agent, I was able to iterate through several potential approaches and find the gaps and limitations within the space of an afternoon. By the time we got to the end of that process the LLM wrote up a nice doc of notes on the experiments, and *I* knew what I wanted to do next. Knowing that, I was able to give a more detailed and specific prompt to Claude which then scaffolded out a solution. I spent probably another day tweaking, testing, and cleaning up.
Overall I think it's completely fair to say that Claude saved me a week of dev time on this particular task. The amount of reading and learning and iterating I'd have had to do to get the same result would have just taken 3-4 days of work. (not to mention the number of hours I might have wasted when I got stuck and scrolled HN for an hour or whatever).
So it still needed my discernment and guidance - but there's no question that I moved through the process much quicker than I would have unassisted.
That's worth the $8 in API credit ten times over and no amount of parroting the "stochastic parrot" phrase (see what I did there?) would change my mind.
I think pro-AI people sometimes forget/ignore the second order effects on society. I worry about that.