Show HN: One Human + One Agent = One Browser From Scratch in 20K LOC

>>embedd+(OP)
I set some rules for myself: three days of total time, no 3rd party Rust crates, allowed to use commonly available OS libraries, has to support X11/Windows/macOS and can render some websites.

After three days, I have it working with around 20K LOC, whereas ~14K is the browser engine itself + X11, then 6K is just Windows+macOS support.

Source code + CI built binaries are available here if you wanna try it out: https://github.com/embedding-shapes/one-agent-one-browser

>>embedd+(OP)
This is a notably better demonstration of a coding agent generated browser than Cursor's FastRender - it's a fraction of the size (20,000 lines of Rust compared to ~1.6m), uses way fewer dependencies (just system libraries for rendering images and text) and the code is actually quite readable - here's the flexbox implementation, for example: https://github.com/embedding-shapes/one-agent-one-browser/bl...

Here's my own screenshot of it rendering my blog - https://bsky.app/profile/simonwillison.net/post/3mdg2oo6bms2... - it handles the layout and CSS gradiants really well, renders the SVG feed icon but fails to render a PNG image.

I thought "build a browser that renders HTML+CSS" was the perfect task for demonstrating a massively parallel agent setup because it couldn't be productively achieved in a few thousand lines of code by a single coding agent. Turns out I was wrong!

>>vidarh+QN
> I think the human + agent thing absolutely will make a huge difference.

Just the day(s) before, I was thinking about this too, and I think what will make the biggest difference is humans who posses "Good Taste". I wrote a bunch about it here: https://emsh.cat/good-taste/

I think the ending is most apt, and where I think we're going wrong right now:

> I feel like we're building the wrong things. The whole vibe right now is "replace the human part" instead of "make better tools for the human part". I don't want a machine that replaces my taste, I want tools that help me use my taste better; see the cut faster, compare directions, compare architectural choices, find where I've missed things, catch when we're going into generics, and help me make sharper intentional choices.

>>jacque+xi1
Knowing you browse HN quite a lot (not that I'm not guilty of that too), that's some high praise! Thank you :)

I think the focus with LLM-assisted coding for me has been just that, assisted coding, not trying to replace whole people. It's still me and my ideas driving (and my "Good Taste", explained here: https://emsh.cat/good-taste/), the LLM do all the things I find more boring.

> prove that no reference implementation code leaked into the produced code

Hmm, yeah, I'm not 100% sure how to approach this, open to ideas. Basic comparing text feels like it'd be too dumb, using an LLM for it might work, letting it reference other codebase perhaps. Honestly, don't know how I'd do that.

> And finally, this being the work product of an AI process you can't claim copyright, but someone else could claim infringement so beware of that little loophole.

Good point to be aware of, and I guess I by instinct didn't actually add any license to this project. I thought of adding MIT as I usually do, but I didn't actually make any of this so ended up not assigning any license. Worst case scenario, I guess most jurisdictions would deem either no copyright or that I (implicitly) hold copyright. Guess we'll take that if we get there :)

>>rvz+Zt1
For the curious, they have a reasonable AI policy:

https://github.com/LadybirdBrowser/ladybird/blob/master/CONT...

>>embedd+(OP)
I feel like I have talked to Embedding-shape on Hackernews quite a lot that I recognize him. So it was a proud like moment when I saw his hackernews & github comments on a youtube video [0]about the recent cursor thing

It's great to see him make this. I didn't know that he had a blog but looks good to me. Bookmarked now.

I feel like although Cursor burned 5 million$, we saw that and now Embedding shapes takeaway

If one person with one agent can produce equal or better results than "hundreds of agents for weeks", then the answer to the question: "Can we scale autonomous coding by throwing more agents at a problem?", probably has a more pessimistic answer than some expected.

Effectively to me this feels like answering the query which was being what if we have thousands of AI agents who can build a complex project autonomously with no Human. That idea seems dead now. Humans being in the loop will have a much higher productivity and end result.

I feel like the lure behind the Cursor project was to find if its able to replace humans completely in a extremely large project and the answer's right now no (and I have a feeling [bias?] that the answer's gonna stay that way)

Emsh I have a question tho, can you tell me about your background if possible? Have you been involved in browser development or any related endeavours or was this a first new one for you? From what I can feel/have talked with you, I do feel like the answer's yes that you have worked in browser space but I am still curious to know the answer.

A question which is coming to my mind is how much would be the difference between 1 expert human 1 agent and 1 (non expert) say Junior dev human 1 agent and 1 completely non expert say a normal person/less techie person 1 agent go?

What are your guys prediction on it?

How would the economics of becoming an "expert" or becoming a jack of all trades (junior dev) in a field fare with this new technology/toy that we got.

how much productivity gains could be from 1 non expert -> junior dev and the same question for junior -> senior dev in this particular context

[0] Cursor Is Lying To Developers… : https://www.youtube.com/watch?v=U7s_CaI93Mo

>>avmich+FN1
Someone has already done this: https://github.com/viralcode/vib-OS

Also, someone made a similar comment not too long ago. So people surely are curious if this is possible. Kinda surprised this project's submission didn't got popular.

>>happyt+ES1
I have a fun little tool which runs the year-2000-era sloccount algorithm (which is Perl and C so I run it in WebAssembly) to estimate the time and cost of a project here: https://tools.simonwillison.net/sloccount

If you paste https://github.com/embedding-shapes/one-agent-one-browser into the "GitHub Repository" tab it estimates 4.58 person-years and $618,599 by year-2000 standards, or 5.61 years and $1,381,079 according to my very non-trustworthy 2025 estimate upgrade.

>>embedd+(OP)
The binaries are only around 1 MB for Linux, Mac and Windows. Very impressive https://github.com/embedding-shapes/one-agent-one-browser/re...

>>Quadma+9A1
It's a really basic browser. It's made less as an independent thing, and more as a reply to https://cursor.com/blog/scaling-agents, so as long as it does more or less the same as theirs, but is less LOC, it does what I set out for it to do :)

> I get to evaluate on stuff like links being consistently blue and underlined

Yeah, this browser doesn't have a "default stylesheet" like a regular browser. Probably should have added that, but was mostly just curious about rendering the websites from the web, rather than using what browsers think the web should look like.

> It may be that some of the rendering is not supported on windows- the back button certainly isn't.

Hmm, on Windows 11 the back button should definitively work, tried that just last night. Are you perhaps on Windows 10? I have not tried that myself, should work but might be why.

>>simonw+MT1
Haha yea, Me and emsh were actually talking about it on bluesky (which I saw after seeing your bluesky, I didn't know both you and emsh were on bsky haha)

https://bsky.app/profile/emsh.cat/post/3mdgobfq4as2p

But basically I got curious and you can see from my other comments on you how much I love golang so decided to port the project from rust to golang and emsh predicts that the project's codebase can even shrink to 10k!

(although one point tho is that I don't have CC, I am trying it out on the recently released Kimi k2.5 model and their code but I decided to use that to see the real world use case of an open source model as well!)

Edit: I had written this comment just 2 minutes before you wrote but then I decided to write the golang project

I mean, I think I ate through all of my 200 queries in kimi code & it now does display me a (browser?) and I had the shell script as something to test your website as the test but it only opens up blank

I am gonna go sleep so that the 5 hour limits can get recharged again and I will continue this project.

I think it will be really interesting to see this project in golang, there must be good reason for emsh to say the project can be ~10k in golang.

>>storys+J72
I gave you a more complete answer here: >>46787781

> since getting the agent to self-select the right scope is usually the main bottleneck

I haven't found this to ever be the bottleneck, what agent and model are you using?

>>simonw+MT1
Edit 2: looks like the project took literally the last token I had to create a big buggy implementation in golang haha!

I kind of left the agents to do what they wanted just asking for a port.

Your website does look rotated and the image is the only thing visible in my golang port.

Let me open source it & I will probably try to hammer it some more after I wake up to see how good Kimi is in real world tasks.

https://github.com/SerJaimeLannister/golang-browser

I must admit that its not working right now and I am even unable to replicate your website that was able to first display even though really glitchy and image zoomed to now only a white although oops looks like I forgot the i in your name and wrote willson instead of willison as I wasn't wearing specs. Sorry about that

Now Let me see yeah now its displaying something which is extremely glitchy

https://github.com/SerJaimeLannister/golang-browser/blob/mai...

I have a file to show how glitchy it is. I mean If anything I just want someone to tinker around with if a golang project can reasonably be made out of this rust project.

Simon, I see that you were also interested in go vibe coding haha, this project has independent tests too! Perhaps you can try this out as well and see how it goes! It would be interesting to see stuff then!

Alright time for me to sleep now, good night!

>>happyt+ip2
Ah :) On the browser part, I've spent huge chunks of time inside of the browser viewport as a frontend engineer, also as a backend engineer and finally managing infrastructure, but never much inside browser internals and painting, layouting and that sort of stuff. I wouldn't even say that frontend performance (re trashing, re-calculating layouts, etc) is my forte, mostly been focusing on being able to mold codebases into something that doesn't turn into spaghetti after a year of various developers working on it.

The prompts themselves were basically "I'd like this website to render correct: https://medium.com, here's how it looks for me in Firefox with JavaScript turned off: [Image], figure out what features are missing, add them one-by-one, add regression texts and follow REQUIREMENTS.md and AGENTS.md closely" and various iterations/variations of that, so I didn't expressively ask it to implement specific CSS/HTML features, as far as I can remember. Maybe the first 2-3 prompts I did, I'll upload all the session files in a viewable way so everyone can see for themselves what exactly went on :)

>>barred+GV1
"only around 1MB" is not particularly impressive in absolute terms... there are a few browsers which are the same or smaller, and yet more functional.

https://tinyapps.org/network.html

Of course, "AI-generated browser is 1MB" is neither here nor there.

>>pizlon+YU2
Here's more about the COCOMO model it uses: https://dwheeler.com/sloccount/sloccount.html#cocomo

>>hedgeh+KV1
the modern equivalent is the Web Platform Tests

https://web-platform-tests.org/

>>anonym+143
> Actually good software that is suitable for mass adoption would go a long way to convincing a lot of people.

Yeah, that's obviously a lot harder, but doable. I've built it for clients, since they pay me, but haven't launch/made public something of my own, where I could share the code, I guess might be useful next project now.

> This is just, yet another, proof-of-concept.

It's not even a PoC, it's a demonstration of how far off the mark Cursor are with their "experiment" where they were amazed by what "hundreds of agents" build for week(s).

> there's no telling how closely the code mirrors existing open-source implementations if you aren't versed on the subject

This is absolutely true, I tried to get some better answers on how one could even figure that out here: >>46784990

>>aix1+nb3
I think datasette-transactions https://github.com/datasette/datasette-transactions is pretty novel. Here's the transcript where Claude Code built it: https://gisthost.github.io/?a41ce6304367e2ced59cd237c576b817...

That transcript viewer itself is a pretty fun novel piece of software, see https://github.com/simonw/claude-code-transcripts

Denobox https://github.com/simonw/denobox is another recent agent project which I consider novel: https://orphanhost.github.io/?simonw/denobox/transcripts/ses...

>>Dave3o+DU3
I think what I wanted to demonstrate here was less "You can build a browser with an agent", and more how bullshit Cursor's initial claim was, that "hundreds of agents" somehow managed to build something good, autonomously. It's more of a continuation of a blog post I wrote some days ago (https://emsh.cat/cursor-implied-success-without-evidence/), than a standalone proof of "agents can build browsers".

Unfortunately, this context is kind of implicit, I don't actually mention it in the blog post, which I probably should have done, that's my fault.

>>embedd+(OP)
An obvious nice thing here compared to the cursor post is the human involvement gives some minimum threshold confidence that the writer of the post has actually verified the claims they've made :^) Illustrates how human comprehension is itself a valuable "artifact" we won't soon be able to write off.

My comment on the cursor post for context: >>46625491

>>hedgeh+d15
> I would pair that with the WhatWG HTML spec

I placed some specifications + WPT into the repository the agent had access to! https://github.com/embedding-shapes/one-agent-one-browser/tr...

But judging by the session logs, it doesn't seem like the agent saw them, I never pointed it there, and seems none of the searches returned anything from there.

I'm slightly curious in doing it from scratch again, but this time explicitly point it to the specifications, and see if it gets better or worse.

>>embedd+Sj4
Maybe you can divide the task into verifiable environments like an HTML5 parser environment where an agent is going to build the parser and also check the progress against a test suites (the https://github.com/html5lib/html5lib-tests in this case) and then write the API into a .md, the job of the human is going to be at the beginning to create the various environments where the agents are going to build the components from (and also how much it can be divided into standalone components).

>>embedd+(OP)
This one's really nice.

- clear code structure and good architecture(modular approach reminiscent of Blitz but not as radical, like Blitz-lite).

- Very easy to follow the code and understand how the main render loop works:

    - For Mac: main loop is at https://github.com/embedding-shapes/one-agent-one-browser/blob/master/src/platform/macos/windowed.rs#L74
   
    - You can see clearly how UI events as passed to the App to handle. 

    - App::tick allows the app to handle internal events(Servoshell does something similar with `spin_event_loop` at https://github.com/servo/servo/blob/611f3ef1625f4972337c247521f3a1d65040bd56/components/servo/servo.rs#L176)

    - If a redraw is needed, the main render logic is at https://github.com/embedding-shapes/one-agent-one-browser/blob/master/src/platform/macos/windowed.rs#L221 and calls into `render` of App, which computes a display list(layout) and then translates it into commands to the generic painter, which internally turns those into platform specific graphics operations.

- It's interesting how the painter for Mac uses Cocoa for graphics; very different from Servo which uses Webrender or Blitz which(in some path) uses Vello(itself using wgpu). I'd say using Cocoa like that might be closer to what React-Native does(expert to comfirm this pls?). Btw this kind of platform specific bindings is a strength of AI coding(and a real pain to do by hand).

- Nice modularity between the platform and browser app parts achieved with the App and Painter traits.

How to improve it further? I'd say try to map how the architecture correspond to Web standards, such as https://html.spec.whatwg.org/multipage/webappapis.html#event...

Wouldn't have to be precise and comprehensive, but for example parts of App::tick could be documented as an initial attempt to implement a part of the web event-loop and `render` as an attempt at implementing the update-the-rendering task.

You could also split the web engine part from the app embedding it in a similar way to the current split between platform and app.

Far superior, and more cost effective, than the attempt at scaling autonomous agent coding pursued by Fastrender. Shows how the important part isn't how many agents you can run in parallel, but rather how good of an idea the human overseeing the project has(or rather: develops).

zlacker

Show HN: One Human + One Agent = One Browser From Scratch in 20K LOC