My AI skeptic friends are all nuts

>>tablet+(OP)
The argument that I've heard against LLMs for code is that they create bugs that, by design, are very difficult to spot.

The LLM has one job, to make code that looks plausible. That's it. There's no logic gone into writing that bit of code. So the bugs often won't be like those a programmer makes. Instead, they can introduce a whole new class of bug that's way harder to debug.

>>jszymb+JM
This is a misunderstanding. Modern LLMs are trained with RL to actually write good programs. They aren't just spewing tokens out.

>>mindwo+TN
No, YOU misunderstand. This isn't a thing RL can fix

  https://news.ycombinator.com/item?id=44163194

  https://news.ycombinator.com/item?id=44068943

It doesn't optimize "good programs". It interprets "humans interpretation of good programs." More accurately, "it optimizes what low paid over worked humans believe are good programs." Are you hiring your best and brightest to code review the LLMs?

Even if you do, it still optimizes tricking them. It will also optimize writing good programs, but you act like that's a well defined and measurable thing.

>>godels+0S
This is just semantics. What's the difference between a "human interpretation of a good program" and a "good program" when we (humans) are the ones using it? If the model can write code that passes tests, and meets my requirements, then it's a good programmer. I would expect nothing more or less out of a human programmer.

>>mindwo+tT
> What's the difference between a "human interpretation of a good program" and a "good program" when we (humans) are the ones using it?

Correctness.

> and meets my requirements

It can't do that. "My requirements" wasn't part of the training set.

>>otabde+2V
"Correctness" in what sense? It sounds like it's being expanded to an abstract academic definition here. For practical purposes, correct means whatever the person using it deems to be correct.

> It can't do that. "My requirements" wasn't part of the training set.

Neither are mine, the art of building these models is that they are generalisable enough that they can tackle tasks that aren't in their dataset. They have proven, at least for some classes of tasks, they can do exactly that.

>>mindwo+iY

  > to an abstract academic definition here

Besides the fact that your statement is self contradicting, there is actually a solid definition [0]. You should click the link on specification too. Or better yet, go talk to one of those guys that did their PhD in programming languages.

  > They have proven

Have they?

Or did you just assume?

Yeah, I know they got good scores on those benchmarks but did you look at the benchmarks? Look at the question and look what is required to pass it. Then take a moment and think. For the love of God, take a moment and think about how you can pass those tests. Don't just take a pass at face value and move on. If you do, well I got a bridge to sell you.

[0] https://en.wikipedia.org/wiki/Correctness_(computer_science)

>>godels+H11
Sure,

> In theoretical computer science, an algorithm is correct with respect to a specification if it behaves as specified.

"As specified" here being the key phrase. This is defined however you want, and ranges from a person saying "yep, behaves as specified", to a formal proof. Modern language language models are trained under RL for both sides of this spectrum, from "Hey man looks good", to formal theorem proving. See https://arxiv.org/html/2502.08908v1.

So I'll return to my original point: LLMs are not just generating outputs that look plausible, they are generating outputs that satisfy (or at least attempt to satisfy) lots of different objectives across a wide range of requirements. They are explicitly trained to do this.

So while you argue over the semantics of "correctness", the rest of us will be building stuff with LLMs that is actually useful and fun.

>>mindwo+E71
You have to actually read more than the first line of a Wikipedia article to understand it

  > formal theorem proving

You're using Coq and Lean?

I'm actually not convinced you read the paper. It doesn't have anything to do with your argument. Someone using LLMs with formal verification systems is wildly different than LLMs being formal verification systems.

This really can't work if you don't read your own sources

zlacker