My AI skeptic friends are all nuts

>>tablet+(OP)
I'm not a skeptic, but I keep LLMs on a short leash.

This is a thoughtful article. Thanks `tptacek

My LLM use is: 1 - tedious stuff; web pages interacting with domain back end. 2 - domain discovery.

In a recent adventure, I used Claude 4 to tease out parameters in a large graph schema. This is a combination of tedium and domain discovery (it's not my graph and I'm not a domain expert). In the first day, Claude uncovered attributes and relations no other LLM or Google search uncovered. And it worked!! The next day, I allowed it to continue. After a bit, results didn't pass the sniff test.

I checked into details of Claude's thinking: it decided to start making up schema attributes and inventing fallback queries on error with more made up attributes. It was "conscious" of its decision to do so. By the time I caught this, Claude had polluted quite a bit of code. Sure, plenty of well placed git commits helped in rolling back code...but it's not quite that simple..over the many git commits were sprinkled plenty of learnings I don't want to toss. It took another two days of carefully going through the code to pull out the good stuff and then roll things back. So now I'm at day five of this adventure with cleaned up code and notes on what we learned.

I suspect continual improvements on tooling will help. Until then, it's a short leash.

>>jhanco+KH
One question is whether, even after all that backpedaling, you feel you could've achieved the same or a similar result in those five days. My findings have been that it's a net plus for productivity, but I'm a bit less sure whether I prefer the way work feels when a lot of it is just going back and cleaning up after the growth. (Of course, that sounds like a familiar statement for a lot of engineers before LLMs, too.)

>>lechat+zT
This is why agents suck.

Backpedling is a massive inefficiency.

A better way is the single clean step approach.

Use the largest LLM you can. Have it generate a single output for one update.

If that update has logical errors or dropped anything you asked for restart, refine, narrow until it does.

It's quite hard to plan each step right but the level and conplexity you can get to is far higher than an agent.

Agents are much better at the shallow/broad problems.

Large LLMs are exponentially better deep/narrow problems.

zlacker