zlacker

Did anyone manage to run the tests from the repository itself? The code seems filled with errors and warnings, as far as I can tell none of them because of the platform I'm on (Linux). I went and looked at the Action workflow history for some pages, and seems CI been failing for a while, PRs also all been failing CI but merged. How exactly was this verified to be something to be used as an successful example, or am I misunderstanding what point they are trying to make? They mention a screenshot, but they never actually mention if their goal was successfully met, do they?

I'm not sure the approach of "completely autonomous coding" is the right way to go. I feel like maybe we'll be able to use it more effectively if we think of them as something to be used by a human to accomplish some thing instead, lean into letting the human drive the thing instead, because quality spirals so quickly out of control.

replies(4): >>snek_c+nD >>csomar+4W >>seanc+ae2 >>idopms+DP2

>>embedd+(OP)
I found the codebase very hard to navigate. Hundreds (over a thousand?) tiny files with less than 200 lines of code, in deeply nested subdirectories. I wanted to find where the JavaScript engine was, and where the DOM implementation was located, and I couldn't easily find it, even using the GitHub search feature. I'm not exactly sure what this browser implements and how.

Even their README is kind of crappy. Ideally you want installation instructions right near the top, but it's broken into multiple files. The README link that says "running + architecture" (but the file is actually called browser_ui.md???) is hard to follow. There is no explicit list of dependencies, and again no explanation of how JavaScript execution works, or how rendering works, really.

It's impressive that they got such a big project to be built by agents and to compile, but this codebase... Feels like AI slop, and you couldn't pay me to maintain it. You could try to get AI agents to maintain it, but my prediction is that past some scale, they would have a hard time figuring out their own mess. You would just be left with permanent bugs you can't easily fix.

replies(3): >>boness+k01 >>embedd+dj1 >>datsci+eE1

>>embedd+(OP)
You can stop reading the article from here:

> Today's agents work well for focused tasks, but are slow for complex projects.

What does slow mean? Slower than humans? Need faster GPUs? What does it even imply? Too slow to produce the next token? Too slow in attempts to be usable? Need human intervention?

This piece is made and written to keep the bubble inflating further.

>>snek_c+nD
So the chain of events here is: copy existing tutorials and public/available code, train the model to spit it out-ish when asked, a mature-ish specification is used, and now they jitter and jumble towards a facsimile of a junior copy paste outsourcing nightmare they can’t maintain (creating exciting liabilities for all parties involved).

I can’t shake the feeling that simply being a shameless about copy-paste (ie copyright infringement), would let existing tools do much the same faster and more efficiently. Download Chromium, search-replace ‘Google’ with ‘ME!’, run Make… if I put that in a small app someone would explain that’s actually solvable as a bash one-liner.

There’s a lot of utility in better search and natural language interactions. The siren call of feedback loops plays with our sense of time and might be clouding or sense of progress and utility.

replies(1): >>kungfu+N31

>>boness+k01
You raise a good point, which is that autonomous coding needs to be benchmarked on designs/challenges where the exact thing being built isn't part of the model's training set.

replies(1): >>Nitpic+Rb1

>>kungfu+N31
swe-REbench does this. They gather real issues from github repos on a ~monthly basis, and test the models. On their leaderboard you can use a slider to select issues created after a model was released, and see the stats. It works for open models, a bit uncertain on closed models. Not perfect, but best we have for this idea.

>>snek_c+nD
> It's impressive that they got such a big project to be built by agents and to compile

But that's the thing, it doesn't compile, has a ton of errors, CI seems broken since long... What exactly is supposed to impressive here, that it managed to generate a bunch of code that doesn't even compile?

What in the holy hackers is this even about? Am I missing something obvious here? How is this news?

replies(2): >>askl+ll1 >>underd+Fo1

>>embedd+dj1
> What in the holy hackers is this even about? Am I missing something obvious here?

It's about hyping up cursor and writing a blog post. You're not supposed to look at or use the code, obviously.

>>embedd+dj1
Looks like it doesn't compile for at least one other guy (I myself haven't tried): https://github.com/wilsonzlin/fastrender/issues/98

Yeah, answers need to be given.

replies(1): >>snek_c+k12

>>snek_c+nD
To steelman the vibecoders’ perspective, I think the point is that the code is not meant for you to read.

Anyone who has looked at AI art, read AI stories, listened to AI music, or really interacted with AI in any meaningfully critical way would recognize that this was the only predictable result given the current state of AI generated “content”. It’s extremely brittle, and collapses at the smallest bit of scrutiny.

But I guess (to continue steelmanning) the paradigm has shifted entirely. Why do we even need an entire browser for the whole internet? Why can’t we just vibe code a “browser” on demand for each web page we interact with?

I feel gross after writing this.

replies(2): >>embedd+hJ1 >>snek_c+P02

>>datsci+eE1
If it's not meant to be read, and not meant to be run since it doesn't compile and doesn't seem like it's been able to for quite some time, what is this mean to demonstrate?

That agents can write a bunch of code by themselves? We already knew that, and what's even the point of that if the code doesn't work?

I feel like I'm still missing what this entire project and blogpost is about. Is it supposed to be all theoretical or what's the deal?

replies(1): >>datsci+db2

>>datsci+eE1
I've had AI write some very nice, readable code, but I make it go one function at a time.

replies(1): >>datsci+u82

>>underd+Fo1
Cursor is in the business of selling you more tokens, so it makes sense that they would exaggerate the capabilities of their models, and even advertise it being used to produce lots of code over weeks. This would probably cost you thousands in API usage fees.

>>snek_c+P02
Writing code one function at a time is not the the 100x speed up being hyped all over HN. I also write my code one function at a time, often assisted by various tools, some of them considered “AI”.

Writing code one function at a time is the furthest thing than what is being showcased in TFA.

>>embedd+hJ1
You and me both, bud. I often feel these days that humanity has never had a more fractured reality, and worse, those fractures are very binary and tribal. I cope by trying to find fundamental truths that are supported by overwhelming evidence rather than focus on speculation.

I guess the fundamental truth that I’m working towards for generative AI is that it appears to have asymptotic performance with respect to recreating whatever it’s trying to recreate. That is, you can throw unlimited computing power and unlimited time at trying to recreate something, but there will still be a missing essence that separates the recreation from the creation. In very small snippets, and for very large compute, there may be reasonable results, but it will never be able to completely replace what can be created in meatspace by meatpeople.

Whether the economics of the tradeoff between “nearly recreated” and “properly created” is net positive is what I think this constant ongoing debate is about. I don’t think it’s ever going to be “it always makes sense to generate content instead of hire someone for this”, but rather a more dirty, “in this case, we should generate content”.

replies(1): >>embedd+Wz2

>>embedd+(OP)
Code filled with errors and warnings? PR's merged with failing CI?

So I guess they've achieved human parity then?

(I'll see myself out)

>>datsci+db2
No, but this blogpost is on a whole other level. Usually at least the stuff they showcase at least does something, not shovelware that doesn't compile.

>>embedd+(OP)
> I'm not sure the approach of "completely autonomous coding" is the right way to go.

I suspect the author of the post would agree. This feels much more like a experiment to push the limits of LLMs than anything they're looking to seriously use as a product (or even the basis of a product).

I think the more interesting question is when the approach of completely autonomous coding will be the right way to go. LLMs are definitely progressing along a spectrum of: Can't do it -> Can do it with help -> Can do it alone but code isn't great -> Can do it alone with good code. Right now I'd say they're only in that final step for very small projects (e.g. simple Python scripts), but it seems like an inevitability that they will get there for increasingly large ones.