Cursor's latest “browser experiment” implied success without evidence

>>embedd+(OP)
If you look at the original Cursor post, they say they are currently running similar experiments, for instance, this Excel clone:

https://github.com/wilson-anysphere/formula

The Actions overview is impressive: There have been 160,469 workflow runs, of which 247 succeeded. The reason the workflows are failing is because they have exceeded their spending limit. Of course, the agents couldn't care less.

>>deng+6J
IMHO people are missing the forest for the trees. The point of this experiment is not to build a functional browser but to develop ways to make agents create large codebases from scratch over a very long time span. A Web browser is just a convenient target because there are lots of documentation, specs and tests available.

>>felipe+Kb2
The point is to learn how to make very large codebases that don't compile? Why do you need tests and specs if it's not going to even run, much less run correctly?

>>saghm+Kx2
As discussed elsewhere, it is apparently possible to compile and run this particular project. It seems that whatever process they followed allows commits to break the build pretty often.

Nevertheless, IMHO what’s interesting about this is not the browser itself but rather that AI companies (not just Cursor) are building systems where humans can be out of the loop for days or weeks.

>>felipe+hC2
> As discussed elsewhere, it is apparently possible to compile and run this particular project.

After a human stepped in to fix it, yes. You can see it yourself here: https://github.com/wilsonzlin/fastrender/issues/98

> Nevertheless, IMHO what’s interesting about this is not the browser itself but rather that AI companies (not just Cursor) are building systems where humans can be out of the loop for days or weeks.

But that's not what they demonstrated here. What they demonstrated, so far, is that you can let agents write millions of lines of code, and eventually if you actually need to run it, some human need to "merge the latest snapshot" or do some other management to actually put together the system into a workable state.

Very different from what their original claims were.

zlacker