Cursor's latest “browser experiment” implied success without evidence

>>embedd+(OP)
The blog[0] is worded rather conservatively but on Twitter [2] the claim is pretty obvious and the hype effect is achieved [2]

CEO stated "We built a browser with GPT-5.2 in Cursor"

instead of

"by dividing agents into planners and workers we managed to get them busy for weeks creating thousands of commits to the main branch, resolving merge conflicts along the way. The repo is 1M+ lines of code but the code does not work (yet)"

[0] https://cursor.com/blog/scaling-agents

[1] https://x.com/kimmonismus/status/2011776630440558799

[2] https://x.com/mntruell/status/2011562190286045552

[3]https://www.reddit.com/r/singularity/comments/1qd541a/ceo_of...

>>paulus+0w
Even then, "resolving merge conflicts along the way" doesn't mean anything, as there are two trivial merge strategies that are always guaranteed to work ('ours' and 'theirs').

>>deng+sx
Haha. True, CI success was not part of PR accept criteria at any point.

If you view the PRs, they bundle multiple fixes together, at least according to the commit messages. The next hurdle will be to guardrail agents so that they only implement one task and don't cheat by modifying the CI piepeline

>>paulus+fA
If I had a nickel for every time I've seen a human dev disable/xfail/remove a failing test "because it's wrong" and then proceeding to break production I would have several nickels, which is not much, but does suggest that deleting failing tests, like many behaviors, is not LLM-specific.

>>former+NB
Had humans not been doing this already, I would have walked into Samsung with the demo application that was working an hour before my meeting, rather than the android app that could only show me the opening logo.

There are a lot of really bad human developers out there, too.

>>mickda+KA1
> Entrepreneur, CEO and founder of Tomorrowish a social media DVR

So you flubbed managing a project and are now blaming your employees. Classy.

zlacker