Related: Scaling long-running autonomous coding - https://news.ycombinator.com/item?id=46624541 - Jan 2026 (174 comments)
Edit: As mentioned, I ran `cargo check` on all the last 100 commits, and seems every single of them failed in some way: https://gist.github.com/embedding-shapes/f5d096dd10be44ff82b...
CEO stated "We built a browser with GPT-5.2 in Cursor"
instead of
"by dividing agents into planners and workers we managed to get them busy for weeks creating thousands of commits to the main branch, resolving merge conflicts along the way. The repo is 1M+ lines of code but the code does not work (yet)"
[0] https://cursor.com/blog/scaling-agents
[1] https://x.com/kimmonismus/status/2011776630440558799
[2] https://x.com/mntruell/status/2011562190286045552
[3]https://www.reddit.com/r/singularity/comments/1qd541a/ceo_of...
The top comment is indeed baseless hype without a hint of skepticism.
There is also clearly a lot of other skeptical people in that submission too. Also, simonw (from that top comment) told me themselves "it's not clear that what they built even runs": https://bsky.app/profile/simonwillison.net/post/3mckgw4mxoc2...
> It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.
"From scratch" sounds very impressive. "custom JS VM" is as well. So let's take a look at the dependencies [1], where we find
- html5ever
- cssparser
- rquickjs
That's just servo [2], a Rust based browser initially built by Mozilla (and now maintained by Igalia [3]) but with extra steps. So this supposed "from scratch" browser is just calling out to code written by humans. And after all that it doesn't even compile! It's just plain slop.
[1] - https://github.com/wilsonzlin/fastrender/blob/main/Cargo.tom...
https://github.com/wilson-anysphere/formula
The Actions overview is impressive: There have been 160,469 workflow runs, of which 247 succeeded. The reason the workflows are failing is because they have exceeded their spending limit. Of course, the agents couldn't care less.
I do not think you are reacting to what I said in good faith.
> he better hope he's on the right side of history here, as otherwise he will have burnt his reputation
That's something I've actually given quite a lot of thought to. My reputation and credibility matters a great deal to me. If it turns out this entire LLM thing was an over-hyped scam I'll take a very big hit to that reputation, and I'll deserve it.
(If AI rises up and tries to kill or enslave us all I'll be too busy fighting back to care.)
But apparently "some pages take a literal minute to load"
Sometime fishy is happening in their `git log`, it doesn't seem like it was the agents who "autonomously" actually made things compile in the end. Notice the git username and email addresses switching around, even some commits made inside a EC2 instance managed to get in there: https://gist.github.com/embedding-shapes/d09225180ea3236f180...
Correct, but Gas Town [1] already happened and what's more _actually worked_, so this experiment is both useless (because it doesn't demonstrate working software) _and_ derivative (because we've already seen that you can set up a project where with spend similar to the spend of a single developer you can churn out more code than any human could read in a week).
> Sometime fishy is happening in their `git log`, it doesn't seem like it was the agents who "autonomously" actually made things compile in the end. Notice the git username and email addresses switching around, even a commit made inside a EC2 instance managed to get in there: https://gist.github.com/embedding-shapes/d09225180ea3236f180...
Gonna need to look closer into it when I have time, but seems they manually patched it up in the end, so the original claim still doesn't stand :/
Same user did a similar thing by creating an AWK interpreter written in Go using LLMs: https://github.com/kolkov/uawk -- as the creator of (I think?) the only AWK interpreter written in Go (https://github.com/benhoyt/goawk), I was curious. It turns out that if there's only one item in the training data (GoAWK), AI likes to copy and paste freely from the original. But again, it's poorly tested and poorly benchmarked.
I just don't see how one can get quality like this, without being realistic about code review, testing, and benchmarking.
>"To test this system, we pointed it at an ambitious goal: building a web browser from scratch."
and then near the end, they say:
>"Hundreds of agents can work together on a single codebase for weeks, making real progress on ambitious projects."
This means they only make progress toward it, but do not "build a web browser from scratch".
If you're curious, the State of Utopia (will be available at https://stateofutopia.com ) did build a web browser from scratch, though it used several packages for the networking portion of it.
See my other comments and posts for links.
- JustHTML [1], which in practice [2] is a port of html5ever [3] to Python.
- justjshtml, which is a port of JustHTML to JavaScript :D [4].
- MiniJinja [5] was recently ported to Go [6].
All three projects have one thing in common: comprehensive test suites which were used to guardrail and guide AI.
References:
1. https://github.com/EmilStenstrom/justhtml
2. https://friendlybit.com/python/writing-justhtml-with-coding-...
3. https://github.com/servo/html5ever
4. https://simonwillison.net/2025/Dec/15/porting-justhtml/
I went through the motions. There are various points in the repo history where compilation is possible, but it's obscure. They got it to compile and operate prior to the article, but several of the PRs since that point broke everything, and this guy went through the effort of fixing it. I'm pretty sure you can just identify the last working commit and pull the version from there, but working out when looks like a big pain in the butt for a proof of concept.
I went through the last 100 commits (>>46647037 ) and nothing there was working (yet/since). Seems now after a developer corrected something it managed to pass `cargo check` without errors, since commit 526e0846151b47cc9f4fcedcc1aeee3cca5792c1 (Jan 16 02:15:02 2026 -0800)
Would be interesting if someone who has managed to run it tries it on some actually complicated text layout edge cases (like RTL breaking that splits a ligature necessitating re-shaping, also add some right-padding in there to spice things up).
[1] https://github.com/wilsonzlin/fastrender/blob/main/src/layou...
[2] https://github.com/wilsonzlin/fastrender/blob/main/src/layou...
[3] Neither being the right place for defining a struct that should go into computed style imo.
See https://felix.dognebula.com/art/html-parsers-in-portland.htm...
The repo is a live incubator for the harness. We are actively researching the behavior of collaborative long running agents, and may in the future make the browser and other products this research produces more consumable by end users and developers, but it's not the goal for now. We made it public as we were excited by the early results and wanted to share; while far off from feature parity with the most popular production browsers today, we think it has made impressive progress in the last <1 week of wall time.
Given the interest in trying out the current state of the project, I've merged a more up-to-date snapshot of the system's progress that resolves issues with builds and CI. The experimental harness can occasionally leave the repo in an incomplete state but does converge, which was the case at the time of the post.
I'm here to answer any further questions you have.
[0] https://x.com/wilsonzlin/status/2012398625394221537?s=20
And this is just one part. Not even considering the fully sandboxed, mini operating system for running webapps.
Plus that linked comment doesn't even say it's "nothing more than a non-functional wrapper for Servo". It disputes the "from scratch" claim.
Most people aren't interested in a nuanced take though. Someone said something plausible sounding and was voted to top by other people? Good enough for me, have another vote. Then twist and exaggerate a little and post it to another comment section. Get more votes. Rinse and repeat.
I do want to briefly note that the JS VM is custom and not QuickJS. It also implemented subsystems like the DOM, CSS cascade, inline/block/table layouts, paint systems, text pipeline, and chrome, and I'd push back against the assertion that it merely calls out to external code. I addressed these points in more detail at [0].
[0] >>46650998 [1] >>46655608
Briefly, the project implemented substantial components, including a JS VM, DOM, CSS cascade, inline/block/table layout, paint systems, text pipeline, and chrome, and is not merely a Servo wrapper.
[0] >>46650998
[1] >>46655608
> We built a browser with GPT-5.2 in Cursor. It ran uninterrupted for one week.
> It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.
> It kind of works! It still has issues and is of course very far from Webkit/Chromium parity, but we were astonished that simple websites render quickly and largely correctly.
It's hard to verify because your project didn't actually compile. But now that you've fixed the compilation manually, can you demonstrate the javascript actually executing? Some of the people who got the slop compiling claimed credibly that it isn't executing any JavaScript.
You merely have to compile your code, run the binary and open this page - http://acid3.acidtests.org. Feel free to post a video of yourself doing this. Try to avoid the embellishment that has characterised this effort so far.
The "in progress" build has a slightly different rendering but the same result
Can you show us what you did after people failed to compile that project [1]?
There are also questions about the attribution of these commits [2]. Can you share some information?
[0] https://github.com/wilsonzlin/fastrender [1] https://github.com/wilsonzlin/fastrender/issues/98 [2] https://gist.github.com/embedding-shapes/d09225180ea3236f180...
After a human stepped in to fix it, yes. You can see it yourself here: https://github.com/wilsonzlin/fastrender/issues/98
> Nevertheless, IMHO what’s interesting about this is not the browser itself but rather that AI companies (not just Cursor) are building systems where humans can be out of the loop for days or weeks.
But that's not what they demonstrated here. What they demonstrated, so far, is that you can let agents write millions of lines of code, and eventually if you actually need to run it, some human need to "merge the latest snapshot" or do some other management to actually put together the system into a workable state.
Very different from what their original claims were.
Tesla owner keeps using Autopilot from backseat—even after being arrested:
https://mashable.com/article/tesla-autopilot-arrest-driving-...
http://www.mickdarling.com/2019/07/26/busy-summer/
An embedded page at landr-atlas.com says:
Attention!
MacOS Security Center has identified that your system is under threat.
Please scan your MacOS as soon as possible to avoid more damage.
Don't leave this page until you have undertaken all the suggested steps
by authorised Antivirus.
[OK]Not that I would excuse Cursor if they're fudging this either - My opinion is that a large part of the growing skepticism and general disillusionment that permeates among engineers in the industry (ex - the jokes about exiting tech to be a farmer or carpenter, or things like https://imgur.com/6wbgy2L) comes from seeing first hand that being misleading, abusive, or outright lying are often rewarded quite well, and it's not a particularly new phenomenon.
Not only did I actually build a Web browser myself, from scratch (ok OK of course with a working OS and Python, and its libraries ;) but mine, did work! And it took me what, few hours, maybe few days if adding it altogether but, not only it did work (namely I did browse my own Website with it) but I had fun with it (!), I learned quite a bit with it (including the provable fact that I can indeed build a Web browser, woohoo!) and finally I did it on... I want say few kilowatts at most, including my computer (obviously) but also myself and the food I ate along the way.
So... to each their own ̄\_ (ツ)_/ ̄
Your slop is worthless except to convince gullible investors to give you more money.
At this point, its 1.5mlocs without the vendored crates (so basically excluding the js engine etc). If you compare that to Servo/Ladybird which are 300k locs each and actually happen to work, agents do love slinging slop.
/// The quirks mode of the document.
#[inline]
pub fn quirks_mode(&self) -> QuirksMode {
self.quirks_mode
}
https://github.com/wilsonzlin/fastrender/blob/3e5bc78b075645...And then this:
/// The quirks mode of the document.
pub fn quirks_mode(&self) -> QuirksMode {
self.stylist.quirks_mode()
}
https://github.com/servo/stylo/blob/71737ad5c8b29c143a6c992a...It seems ChatGPT is still copying segments of code almost verbatim, although sometimes it does weird things, compare these for example:
https://github.com/wilsonzlin/fastrender/blob/3e5bc78b075645...
https://github.com/servo/stylo/blob/71737ad5c8b29c143a6c992a...
https://github.com/wilsonzlin/fastrender/blob/3e5bc78b075645...