Cursor's latest “browser experiment” implied success without evidence

>>embedd+(OP)
If you look at the original Cursor post, they say they are currently running similar experiments, for instance, this Excel clone:

https://github.com/wilson-anysphere/formula

The Actions overview is impressive: There have been 160,469 workflow runs, of which 247 succeeded. The reason the workflows are failing is because they have exceeded their spending limit. Of course, the agents couldn't care less.

>>deng+6J
IMHO people are missing the forest for the trees. The point of this experiment is not to build a functional browser but to develop ways to make agents create large codebases from scratch over a very long time span. A Web browser is just a convenient target because there are lots of documentation, specs and tests available.

>>felipe+Kb2
...but it didn't develop ways of doing that did it?

Any idiot can have cursor run for 2 weeks and produce a pile of crap that doesn't compile.

You know the brilliant insight they came out with?

> A surprising amount of the system's behavior comes down to how we prompt the agents. Getting them to coordinate well, avoid pathological behaviors, and maintain focus over long periods required extensive experimentation. The harness and models matter, but the prompts matter more.

i.e. It's kind of hard and we didn't really come up with a better solution than 'make sure you write good prompts'.

Wellll, geeeeeeeee! Thanks for that insight guys!

Come on. This was complete BS. Planners and workers. Cool. Details? Any details? Annnnnnnyyyyy way to replicate it? What sort of prompts did you use? How did you solve the pathalogical behaviours?

Nope. The vagueness in this post... it's not an experiment. It's just fund raising hype.

zlacker