zlacker

I use Claude Code a lot but one thing that really made me concerned was when I asked it about some ideas I have had which I am very familiar with. It's response was to constantly steer me away from what I wanted to do towards something else which was fine but a mediocre way to do things. It made me question how many times I've let it go off and do stuff without checking it thoroughly.

replies(4): >>physic+7 >>ozlike+G1 >>xgb84j+u4 >>Sammi+994

>>mirsad+(OP)
I've had quite a bit of the "tell it to do something in a certain way", it does that at first, then a few messages of corrections and pointers, it forgets that constraint.

replies(2): >>embedd+Wn >>int_19+JY6

>>mirsad+(OP)
Call me a conspiracy theorist, and granted much of this could be attributed to the fact that the majority of code in existence is shit, but im convinced that these models are trained and encouraged to produce code that is difficult for humans to work on. Further driving and cementing the usage of then when you inevitably have to come back and fix it.

replies(4): >>trcf23+t3 >>except+P3 >>CatMus+b4 >>Perz1v+T6

>>ozlike+G1
Or it takes a lot of time effort and intelligence to produce good code and IA is not there yet…

>>ozlike+G1
I don't think they would be able to have an LLM withouth the flaws. The problem is that an LLM cannot make a distinction between sense and nonsense in the logical way. If you train an LLM on a lot of sensible material, it will try to reproduce it by matching training material context and prompt context. The system does not work on the basis of logical principles, but it can sound intelligent.

I think LLM producers can improve their models by quite a margin if customers train the LLM for free, meaning: if people correct the LLM, the companies can use the session context + feedback to as training. This enables more convincing responses for finer nuances of context, but it still does not work on logical principles.

LLM interaction with customers might become the real learning phase. This doesn't bode well for players late in the game.

replies(1): >>andrek+wka

>>ozlike+G1
This could be the case even without an intentional conspiracy. It's harder to give negative feedback to poor quality code that's complicated vs. poor quality code that's simple.

Hence the feedback these models get could theoretically funnel them to unnecessarily complicated solutions.

No clue has any research been done into this, just a thought OTTOMH.

>>mirsad+(OP)
Mediocre is fine for many tasks. What makes a good software engineer is that he spots the few places in every software where mediocre is not good enough.

>>ozlike+G1
It is a mathematical, averaging model after all

>>physic+7
> it does that at first, then a few messages of corrections and pointers, it forgets that constraint.

Yup, most models suffer from this. Everyone is raving about million tokens context, but none of the models can actually get past 20% of that and still give as high quality responses as the very first message.

My whole workflow right now is basically composing prompts out of the agent, let them run with it and if something is wrong, restart the conversation from 0 with a rewritten prompt. None of that "No, what I meant was ..." but instead rewrite it so the agent essentially solves it without having to do back and forth, just because of this issue that you mention.

Seems to happen in Codex, Claude Code, Qwen Coder and Gemini CLI as far as I've tested.

replies(4): >>physic+LX >>throwd+u11 >>jwalto+0n3 >>jinhku+9A3

>>embedd+Wn
Yes, agreed. I find it interesting that people are saying they're building these huge multi-agent workflows since the projects I've tried it on are not necessarily huge in complexity. I've tried variety of different things re: isntructions files, etc. at this point.

replies(1): >>embedd+W31

>>embedd+Wn
I call this the Groundhog Day loop

replies(1): >>embedd+m31

>>throwd+u11
That's a strange name, why? It's more like a "iterate and improve" loop, "Groundhog Day" to me would imply "the same thing over and over", but then you're really doing something wrong if that's your experience. You need to iterate on the initial prompt if you want something better/different.

replies(1): >>throwd+Zp2

>>physic+LX
So far, I haven't yet seen any demonstration of those kind of multi-agent workflows ending up with code that won't fall down over itself in some days/weeks. Most efforts so far seems to have to been focusing on producing as much code as possible, as fast as possible, while what I'd like to see, if anything, is the opposite of that.

Anytime I ask for demonstration of what the actual code looks like, when people start talking about their own "multi-agent orchestration platforms" (or whatever), they either haven't shared anything (yet), don't care at all about how the code actually is and/or the code is a horrible vibeslopped mess that contains mostly nonsense.

>>embedd+m31
I thought "iterate and improve" was exactly what Phil did.

>>embedd+Wn
LLMs do a cool parlour trick; all they do is predict “what should the next word be?” But they do it so convincingly that in the right circumstances they seem intelligent. But that’s all it is; a trick. It’s a cool trick, and it has utility, but it’s still just a trick.

All these people thinking that if only we add enough billions of parameters when the LLM is learning and add enough tokens of context, then eventually it’ll actually understand the code and make sensible decisions? These same people perhaps also believe if Penn and Teller cut enough ladies in half on stage they’ll eventually be great doctors.

>>embedd+Wn
been experimenting with the same flow as well, it is sort of the motivation behind this project - to streamline the generate code -> detect gaps -> update spec -> implement flow.

curious to hear if you are still seeing code degradation over time?

>>mirsad+(OP)
People pay a lot of money to have people make mediocre solutions for them.

Now you don't have to pay a lot of money to get a mediocre solution that works.

All those things that are broken, but you don't have time or money for them, you can have them fixed now.

>>physic+7
Create an AGENTS.md that says something like, "when I tell you to do something in a certain way, make a note of this here".

The only catch is that you need to periodically review it because it'll accumulate things that are not important, or that were important but aren't anymore.

>>except+P3

  > if people correct the LLM, the companies can use the session context + feedback to as training.

it definitely seems that way; just the other day coderabbit was asking me where i found x when when it told me x didn't exist...

  > LLM interaction with customers might become the real learning phase.

sometimes i wonder why i pay for this if i'm supposed to train this thing...