Cursor's latest “browser experiment” implied success without evidence

>>embedd+(OP)
The blog[0] is worded rather conservatively but on Twitter [2] the claim is pretty obvious and the hype effect is achieved [2]

CEO stated "We built a browser with GPT-5.2 in Cursor"

instead of

"by dividing agents into planners and workers we managed to get them busy for weeks creating thousands of commits to the main branch, resolving merge conflicts along the way. The repo is 1M+ lines of code but the code does not work (yet)"

[0] https://cursor.com/blog/scaling-agents

[1] https://x.com/kimmonismus/status/2011776630440558799

[2] https://x.com/mntruell/status/2011562190286045552

[3]https://www.reddit.com/r/singularity/comments/1qd541a/ceo_of...

>>paulus+0w
Even then, "resolving merge conflicts along the way" doesn't mean anything, as there are two trivial merge strategies that are always guaranteed to work ('ours' and 'theirs').

>>deng+sx
Haha. True, CI success was not part of PR accept criteria at any point.

If you view the PRs, they bundle multiple fixes together, at least according to the commit messages. The next hurdle will be to guardrail agents so that they only implement one task and don't cheat by modifying the CI piepeline

>>paulus+fA
If I had a nickel for every time I've seen a human dev disable/xfail/remove a failing test "because it's wrong" and then proceeding to break production I would have several nickels, which is not much, but does suggest that deleting failing tests, like many behaviors, is not LLM-specific.

>>former+NB
> but does suggest that deleting failing tests, like many behaviors, is not LLM-specific.

True, but it is shocking how often claude suggests just disabling or removing tests.

>>vizzie+TX
100%, trying a bit of an experiment like this(similar in that I mostly just care about playing around with different agents, techniques etc.) it has built out literally hundreds of tests. Dozens of which were almost pointless as it decided to mock apis. When the number of failed tests exceeded 40 it just started disabling tests.

zlacker