Scaling long-running autonomous coding

>>srames+(OP)
The more I think about LLMs the stranger it feels trying to grasp what they are. To me, when I'm working with them, they don't feel intelligence but rather an attempt at mimicking it. You can never trust, that the AI actually did something smart or dump. The judge always has to be you.

It's ability to pattern match it's way through a code base is impressive until it's not and you always have to pull it back to reality when it goes astray.

It's ability to plan ahead is so limited and it's way of "remembering" is so basic. Every day it's a bit like 50 first dates.

Nonetheless seeing what can be achieved with this pseudo intelligence tool makes me feel a little in awe. It's the contrast between not being intelligence and achieving clearly useful outcomes if stirred correctly and the feeling that we just started to understand how to interact with this alien.

>>Chipsh+jZ
> The judge always has to be you.

But you can automate much of that work by having good tests. Why vibe-test AI code when you can code-test it? Spend your extra time thinking how to make testing even better.

zlacker