My AI skeptic friends are all nuts

>>tablet+(OP)
Can someone explain to me what this means?

> People coding with LLMs today use agents. Agents get to poke around your codebase on their own. They author files directly. They run tools. They compile code, run tests, and iterate on the results. ...

Is this what people are really doing? Who is just turning AI loose to modify things as it sees fit? If I'm not directing the work, how does it even know what to do?

I've been subjected to forced LLM integration from management, and there are no "Agents" anywhere that I've seen.

Is anyone here doing this that can explain it?

>>metall+D1
I cut several paragraphs from this explaining how agents work, which I wrote anticipating this exact comment. I'm very happy to have brought you to this moment of understanding --- it's a big one. The answer is "yes, that's exactly what people are doing": "turning LLMs loose" (really, giving them some fixed number of tool calls, some of which might require human approval) to do stuff on real systems. This is exactly what Cursor is about.

I think it's really hard to undersell how important agents are.

We have an intuition for LLMs as a function blob -> blob (really, token -> token, but whatever), and the limitations of such a function, ping-ponging around in its own state space, like a billion monkeys writing plays.

But you can also get go blob -> json, and json -> tool-call -> blob. The json->tool interaction isn't stochastic; it's simple systems code (the LLM could indeed screw up the JSON, since that process is stochastic --- but it doesn't matter, because the agent isn't stochastic and won't accept it, and the LLM will just do it over). The json->tool-call->blob process is entirely fixed system code --- and simple code, at that.

Doing this grounds the code generation process. It has a directed stochastic structure, and a closed loop.

>>tptace+23
I'm sorry but this doesn't explain anything. Whatever it is you have in your mind, I'm afraid it's not coming across on the page. There is zero chance that I'm going to let an AI start running arbitrary commands on my PC, let alone anything that resembles a commit.

What is an actual, real world example?

>>metall+p4
They're not arbitrary, far from it. You have a very constrained set of tools each agent can do. An agent has a "job" if you will.

Maybe the agent feeds your PR to the LLM to generate some feedback, and posts a the text to the PR as a comment. Maybe it can also run the linters, and use that as input to the feedback.

But the at the end of the day, all it's really doing is posting text to a github comment. At worst it's useless feedback. And while I personally don't have much AI in my workflow today, when a bunch of smart people are telling me the feedback can be useful I can't help but be curious!

zlacker