https://github.com/wilson-anysphere/formula
The Actions overview is impressive: There have been 160,469 workflow runs, of which 247 succeeded. The reason the workflows are failing is because they have exceeded their spending limit. Of course, the agents couldn't care less.
Any idiot can have cursor run for 2 weeks and produce a pile of crap that doesn't compile.
You know the brilliant insight they came out with?
> A surprising amount of the system's behavior comes down to how we prompt the agents. Getting them to coordinate well, avoid pathological behaviors, and maintain focus over long periods required extensive experimentation. The harness and models matter, but the prompts matter more.
i.e. It's kind of hard and we didn't really come up with a better solution than 'make sure you write good prompts'.
Wellll, geeeeeeeee! Thanks for that insight guys!
Come on. This was complete BS. Planners and workers. Cool. Details? Any details? Annnnnnnyyyyy way to replicate it? What sort of prompts did you use? How did you solve the pathalogical behaviours?
Nope. The vagueness in this post... it's not an experiment. It's just fund raising hype.