My AI skeptic friends are all nuts

>>tablet+(OP)
I'm not a skeptic, but I keep LLMs on a short leash.

This is a thoughtful article. Thanks `tptacek

My LLM use is: 1 - tedious stuff; web pages interacting with domain back end. 2 - domain discovery.

In a recent adventure, I used Claude 4 to tease out parameters in a large graph schema. This is a combination of tedium and domain discovery (it's not my graph and I'm not a domain expert). In the first day, Claude uncovered attributes and relations no other LLM or Google search uncovered. And it worked!! The next day, I allowed it to continue. After a bit, results didn't pass the sniff test.

I checked into details of Claude's thinking: it decided to start making up schema attributes and inventing fallback queries on error with more made up attributes. It was "conscious" of its decision to do so. By the time I caught this, Claude had polluted quite a bit of code. Sure, plenty of well placed git commits helped in rolling back code...but it's not quite that simple..over the many git commits were sprinkled plenty of learnings I don't want to toss. It took another two days of carefully going through the code to pull out the good stuff and then roll things back. So now I'm at day five of this adventure with cleaned up code and notes on what we learned.

I suspect continual improvements on tooling will help. Until then, it's a short leash.

>>jhanco+KH
Domain discovery is so underrated. LLMs remove so much friction that makes everything so incredibly accessible.

If LLMs couldn't do anything else then that alone would still warrant an invention of a century sticker.

>>wrapti+aI
> Domain discovery is so underrated. LLMs remove so much friction that makes everything so incredibly accessible.

And, unfortunately, a lot of friction from not having access to information in the first place. I've read a bunch of docs from people talking to Glean in order to explore a new topic; if it's a topic I'm actually very familiar with then four out of five times it is somewhere from misleading to catastrophically wrong. Any internal terminology that doesn't match the common usage outside of our organization poisons the whole session and it will make up things to join the meanings together, and the prompter is none the wiser.

I trust AI only as a gap filler in domains that I'm already an expert in or where there's little internal context, anything else is intellectual suicide.

>>aloha2+bR
I feel like if you're using it in the right way, asking the AI to write code, or give domain context in a specific way then the answers it gives are easily verifiable enough- and it's domain knowledge you wouldn't have gotten very easily through a series of google searches- LLMs as a kind of search can work great.

zlacker