zlacker

[return to "My AI skeptic friends are all nuts"]
1. jszymb+JM[view] [source] 2025-06-03 03:48:33
>>tablet+(OP)
The argument that I've heard against LLMs for code is that they create bugs that, by design, are very difficult to spot.

The LLM has one job, to make code that looks plausible. That's it. There's no logic gone into writing that bit of code. So the bugs often won't be like those a programmer makes. Instead, they can introduce a whole new class of bug that's way harder to debug.

◧◩
2. mindwo+TN[view] [source] 2025-06-03 04:05:29
>>jszymb+JM
This is a misunderstanding. Modern LLMs are trained with RL to actually write good programs. They aren't just spewing tokens out.
◧◩◪
3. godels+0S[view] [source] 2025-06-03 04:50:30
>>mindwo+TN
No, YOU misunderstand. This isn't a thing RL can fix

  https://news.ycombinator.com/item?id=44163194

  https://news.ycombinator.com/item?id=44068943
It doesn't optimize "good programs". It interprets "humans interpretation of good programs." More accurately, "it optimizes what low paid over worked humans believe are good programs." Are you hiring your best and brightest to code review the LLMs?

Even if you do, it still optimizes tricking them. It will also optimize writing good programs, but you act like that's a well defined and measurable thing.

◧◩◪◨
4. tuhlat+g12[view] [source] 2025-06-03 14:59:20
>>godels+0S
Those links mostly discuss the original RLHF used to train e.g. ChatGPT 3.5. Current paradigms are shifting towards RLVR (reinforcement learning with verifiable rewards), which absolutely can optimize good programs.

You can definitely still run into some of the problems eluded to in the first link. Think hacking unit tests, deception, etc -- but the bar is less "create a perfect RL environment" than "create an RL environment where solving the problem is easier than reward hacking." It might be possible to exploit a bug in the Lean 4 proof assistant to prove a mathematical statement, but I suspect it will usually be easier for an LLM to just write a correct proof. Current RL environments aren't as watertight as Lean 4, but there's certainly work to make them more watertight.

This is in no way a "solved" problem, but I do see it as a counter to your assertion that "This isn't a thing RL can fix." RL is powerful.

[go to top]