Any idiot can have cursor run for 2 weeks and produce a pile of crap that doesn't compile.
You know the brilliant insight they came out with?
> A surprising amount of the system's behavior comes down to how we prompt the agents. Getting them to coordinate well, avoid pathological behaviors, and maintain focus over long periods required extensive experimentation. The harness and models matter, but the prompts matter more.
i.e. It's kind of hard and we didn't really come up with a better solution than 'make sure you write good prompts'.
Wellll, geeeeeeeee! Thanks for that insight guys!
Come on. This was complete BS. Planners and workers. Cool. Details? Any details? Annnnnnnyyyyy way to replicate it? What sort of prompts did you use? How did you solve the pathalogical behaviours?
Nope. The vagueness in this post... it's not an experiment. It's just fund raising hype.
"We put 200 human in a room and gave them instructions how to build a browser. They coded for hours, resolving merge conflicts and producing code that did not build in the end without intervention of seniors []. We think, giving them better instructions leads to better results"
So they actually invented humans? And will it come down to either "managing humans" or "managing agents"? One of both will be more reliable, more predictable and more convenient to work with. And my guess is, it is not an agent...
As it seemed in the git log, something is weird.
Nevertheless, IMHO what’s interesting about this is not the browser itself but rather that AI companies (not just Cursor) are building systems where humans can be out of the loop for days or weeks.
After a human stepped in to fix it, yes. You can see it yourself here: https://github.com/wilsonzlin/fastrender/issues/98
> Nevertheless, IMHO what’s interesting about this is not the browser itself but rather that AI companies (not just Cursor) are building systems where humans can be out of the loop for days or weeks.
But that's not what they demonstrated here. What they demonstrated, so far, is that you can let agents write millions of lines of code, and eventually if you actually need to run it, some human need to "merge the latest snapshot" or do some other management to actually put together the system into a workable state.
Very different from what their original claims were.