My AI skeptic friends are all nuts

>>tablet+(OP)
The argument that I've heard against LLMs for code is that they create bugs that, by design, are very difficult to spot.

The LLM has one job, to make code that looks plausible. That's it. There's no logic gone into writing that bit of code. So the bugs often won't be like those a programmer makes. Instead, they can introduce a whole new class of bug that's way harder to debug.

>>jszymb+JM
This is a misunderstanding. Modern LLMs are trained with RL to actually write good programs. They aren't just spewing tokens out.

>>mindwo+TN
No, YOU misunderstand. This isn't a thing RL can fix

  https://news.ycombinator.com/item?id=44163194

  https://news.ycombinator.com/item?id=44068943

It doesn't optimize "good programs". It interprets "humans interpretation of good programs." More accurately, "it optimizes what low paid over worked humans believe are good programs." Are you hiring your best and brightest to code review the LLMs?

Even if you do, it still optimizes tricking them. It will also optimize writing good programs, but you act like that's a well defined and measurable thing.

>>godels+0S
I don't know if any of this applies to the arguments in my article, but most of the point of it is that progress in code production from LLMs is not a consequence of better models (or fine tuning or whatever), but rather on a shift in how LLMs are used, in agent loops with access to ground truth about whether things compile and pass automatic acceptance. And I'm not claiming that closed-loop agents reliably produce mergeable code, only that they've broken through a threshold where they produce enough mergeable code that they significantly accelerate development.

>>tptace+471

  > I don't know if any of this applies to the arguments

  > with access to ground truth

There's the connection. You think you have ground truth. No such thing exists

>>godels+Tj1
Yes it does. Ground truth is what 30 years of experience says constitutes mergeable code. Ground truth doesn't mean "perfect, provably correct code", it means whatever your best benchmark for acceptable code is.

In medical AI, where I'm currently working, "ground truth" is usually whatever human experts say about a medical image, and is rarely perfect. The goal is always to do better than whatever the current ground truth is.

zlacker