My AI skeptic friends are all nuts

>>tablet+(OP)
I'm not a skeptic, but I keep LLMs on a short leash.

This is a thoughtful article. Thanks `tptacek

My LLM use is: 1 - tedious stuff; web pages interacting with domain back end. 2 - domain discovery.

In a recent adventure, I used Claude 4 to tease out parameters in a large graph schema. This is a combination of tedium and domain discovery (it's not my graph and I'm not a domain expert). In the first day, Claude uncovered attributes and relations no other LLM or Google search uncovered. And it worked!! The next day, I allowed it to continue. After a bit, results didn't pass the sniff test.

I checked into details of Claude's thinking: it decided to start making up schema attributes and inventing fallback queries on error with more made up attributes. It was "conscious" of its decision to do so. By the time I caught this, Claude had polluted quite a bit of code. Sure, plenty of well placed git commits helped in rolling back code...but it's not quite that simple..over the many git commits were sprinkled plenty of learnings I don't want to toss. It took another two days of carefully going through the code to pull out the good stuff and then roll things back. So now I'm at day five of this adventure with cleaned up code and notes on what we learned.

I suspect continual improvements on tooling will help. Until then, it's a short leash.

>>jhanco+KH
Yeah I'm impressed with its ability to do stuff, but not quite with its results. We have been working on more AI assistance adoption so I asked it to do some decently complex things with json/yml schema definitions and validations (outside the typical json schema we wanted things like conditional validation, etc)... It wrote a LOT of code and took a long time, and kept telling me it would work, and it didn't. I finally stepped in and eliminated roughly 75% of the code in about 10 minutes and got it working. It's great at tedious stuff, but outside of that, I'm skeptical.

>>bherms+bT
IMO, you just noted it’s great at creating tedious (but pointless) stuff?

zlacker