zlacker

With how stochastic the process is it makes it basically unusable for any large scale task. What's the plan? To roll the dice until the answer pops up? That would be maybe viable if there was a way to automatically evaluate it 100% but with a human in the loop required it becomes untenable.

replies(4): >>diggan+8 >>eterev+C >>rsynno+F6 >>Traube+Rq

>>margor+(OP)
> What's the plan?

Call me old school, but I find the workflow of "divide and conquer" to be as helpful when working with LLMs, as without them. Although what is needed to be considered a "large scale task" varies by LLMs and implementation. Some models/implementations (seemingly Copilot) struggles with even the smallest change, while others breeze through them. Lots of trial and error is needed to find that line for each model/implementation :/

replies(3): >>mjburg+H4 >>noneth+P4 >>safety+8a

>>margor+(OP)
The plan is to improve AI agents from their current ~intern level to a level of a good engineer.

replies(9): >>ethano+e1 >>rsynno+a3 >>mnky98+N3 >>interi+k7 >>ehnto+18 >>serial+F8 >>marmak+Z8 >>cyanyd+09 >>einste+Sj1

>>eterev+C
Seems like that is taking a very long time, on top of some very grandiose promises being delivered today.

replies(2): >>DrillS+Z2 >>infect+D3

>>ethano+e1
Third AI Winter from overpromise/underdeliver when?

replies(1): >>rsynno+47

>>eterev+C
I mean, I think this is a _lot_ worse than an intern. An intern isn't constantly going to make PRs with failing CI, for a start.

>>ethano+e1
I look back over the past 2-3 years and am pretty amazed with how quick change and progress have been made. The promises are indeed large but the speed of progress has been fast. Not defending the promise but “taking a very long time” does not seem to be an accurate representation.

replies(4): >>owebma+44 >>ethano+m4 >>zeroon+i7 >>bakugo+H7

>>eterev+C
Yes but they are supposed to be PhD level 5 years ago if you are listening to sama et al.

replies(1): >>rchaud+Jl

>>infect+D3
> The promises are indeed large but the speed of progress has been fast

And at the same time, absurdly slow? ChatGPT is almost 3 years old and pretty much AI has still no positive economic impact.

replies(3): >>infect+m6 >>derekt+g9 >>Workac+Zr

>>infect+D3
I guess it probably depends on what you are doing. Outside of layers on top of these things (tooling), I personally haven't seen much progress.

replies(1): >>infect+y6

>>diggan+8
The relevant scale is the number of hard constraints on the solution code, not the size of task as measured by "hours it would take the median programmer to write".

So eg., one line of code which needed to handle dozens of hard-constraints on the system (eg., using a specific class, method, with a specific device, specific memory management, etc.) will very rarely be output correctly by an LLM.

Likewise "blank-page, vibe coding" can be very fast if "make me X" has only functional/soft-constraints on the code itself.

"Gigawatt LLMs" have brute-forced there way to having a statistical system capable of usefully, if not universally, adhreading to one or two hard constraints. I'd imagine the dozen or so common in any existing application is well beyond a Terawatt range of training and inference cost.

replies(1): >>cyanyd+V8

>>diggan+8
Its hard for me to think of a small, clearly defined coding problem an LLM cant solve.

replies(2): >>jodrel+v6 >>mrguyo+VY

>>owebma+44
Saying “AI has no economic impact” ignores reality. The financials of major players clearly show otherwise—both B2C and B2B applications are already profitable and proven. While APIs are still more experimental, and it’s unclear how much value businesses can ultimately extract from them, to claim there’s no economic impact is willful blindness. AGI may be far off, but companies are already figuring out value from both the consumer side and slowly API.

replies(1): >>ehnto+t9

>>noneth+P4
"Find a counter example to the Collatz conjecture".

>>ethano+m4
What a time we live in. I guess it depends how pessimistic you are.

replies(1): >>lcnPyl+P9

>>margor+(OP)
I suspect that the plan is that MS has spent a lot, really a LOT, of money on this nonsense, and there is now significant pressure to put, something, anything, out even if it is worse than useless.

>>DrillS+Z2
Third? It’ll be the tenth or so.

>>infect+D3
I feel like we've made barely any progress. It's still good at the things Chat GPT was originally good at, and bad at the things it was bad at. There's some small incremental refinement but it doesn't really represent a qualitative jump like Chat GPT was originally. I don't see AI replacing actual humans without another step jump like that.

replies(1): >>Workac+Iq

>>eterev+C
You are really underselling interns. They learn from a single correction, sometimes even without a correction, all by themselves. Their ability to integrate previous experience in the context of new problems is far, far above what I've ever seen in LLMs

>>infect+D3
> I look back over the past 2-3 years and am pretty amazed with how quick change and progress have been made.

Now look at the past year specifically, and only at the models themselves, and you'll quickly realize that there's been very little real progress recently. Claude 3.5 Sonnet was released 11 months ago and the current SOTA models are only marginally better in terms of pure performance in real world tasks.

The tooling around them has clearly improved a lot, and neat tricks such as reasoning have been introduced to help models tackle more complex problems, but the underlying transformer architecture is already being pushed to its limits and it shows.

Unless some new revolutionary architecture shows up out of nowhere and sets a new standard, I firmly believe that we'll be stuck at the current junior level for a while, regardless of how much Altman & co. insist that AGI is just two more weeks away.

>>eterev+C
They are not intern level.

Even if it could perform at a similar level to an intern at a programming task, it lacks a great deal of the other attributes that a human brings to the table, including how they integrate into a team of other agents (human or otherwise). I won't bother listing them, as we are all humans.

I think the hype is missing the forest for the trees, and I think exactly this multi-agent dynamic might be where the trees start to fall down in front of us. That and the as currently insurmountable issues of context and coherence over long time horizons.

replies(2): >>Tade0+0c >>Workac+Yp

>>eterev+C
This looks much worse than an intern. This feels like a good engineer who has brain damage.

When you look at it from afar, it looks potentially good, but as you start looking into it for real, you start realizing none of it makes any sense. Then you make simple suggestions, it does something that looks like what you asked, yet completely missing the point.

An intern, no matter how bad it is, could only waste so much time and energy.

This makes wasting time and introducing mind-bogglingly stupid bugs infinitely scalable.

>>mjburg+H4
Keep in mind that the model of using LLM assumes the underlying dataset converges to production ready code. Thats never been proven, cause we know they scraped sourcs code without attribution.

>>eterev+C
The plan went from the AI being a force multiplier, to a resource hungry beast that have to be fed in the hope it's good enough to justify its hunger.

>>eterev+C
I plan to be a billionaire

>>owebma+44
OpenAI alone is on track to generate as much revenue as Asus or US Steel this year ($10-$15 billion). I don't know how you can say AI has had no positive economic impact.

replies(3): >>owebma+Ia >>Simian+0x >>einste+qj1

>>infect+m6
The financials are all inflated by perception of future impact. This includes the current subscriptions as businesses are attempting to use AI to some economic benefit, but it's not all going to work out to be useful.

It will take some time for whatever reality is to actually show truthfully in the financials. When VC money stops subsidising datacentre costs, and businesses have to weigh the full price against real value provided, that is when we will see the reality of the situation.

I am content to be wrong either way, but my personal prediction is if model competence slows down around now, businesses will not be replacing humans en-mass, and the value provided will be notable but not world changing like expected.

>>infect+y6
To their point, there hasn’t been any huge breakthrough in this field since the “attention is all you need” paper. Not really any major improvements to model architecture, as far as I am aware. (Admittedly, this is a new field of study to me.) I believe one hope is to develop better methods for self-supervised learning; I am not sure of the progress there. Most practical improvements have been on the hardware and tooling side (GPUs and, e.g., pytorch).

Don’t get me wrong: the current models are already powerful and useful. However, there is still a lot of reason to remain skeptical of an imminent explosion in intelligence from these models.

replies(1): >>infect+Na

>>diggan+8
I mean I guess this isn't very ambitious, but it's a meaningful time saver if I basically just write code in natural language, and then Copilot generates the real code based on that. I don't have to look up syntax details, or what some function somewhere was named, etc. It will perform very accurately this way. It probably makes me 20% more efficient. It doubles my efficiency in a language I'm unfamiliar with.

I can't fire half my dev org tomorrow with that approach, I can't really fire anyone, so I guess it would be a big letdown for a lot of execs. Meanwhile though we just keep incrementally shipping more stuff faster at higher quality so I'm happy...

This works because it treats the LLM like what it actually is: an exceptionally good if slightly random text transformer.

>>derekt+g9
That is not even 1 month of a big tech revenue, it is a global negligible impact. 3 years talking about AI changing the world, 10bi revenue and no ecosystem around making money besides friends and VCs pumping and dumping LLM wrappers.

replies(1): >>derekt+lv

>>lcnPyl+P9
You’re totally right that there hasn’t been a fundamental architectural leap like “attention is all you need”, that was a generational shift. But I’d argue that what we’ve seen since is a compounding of scale, optimization, and integration that’s changed the practical capabilities quite dramatically, even if it doesn’t look flashy in an academic sense. The models are qualitatively different at the frontier, more steerable, more multimodal, and increasingly able to reason across context. It might not feel like a revolution on paper, but the impact in real-world workflows is adding up quickly. Perhaps all of that can be put in the bucket of “tooling” but from my perspective there has still been quite large leaps looking at cost differences alone.

For some reason my pessimism meter goes off when I see single sentence arguments “change has been slow”. Thanks for brining the conversation back.

replies(1): >>skydha+ad

>>ehnto+18
My impression is that Copilot acts a lot like one of my former coworkers, who struggled with:

-Being a parent to a small child and the associated sleep deprivation.

-His reluctance to read documentation.

-There being a language barrier between him the project owners. Emphasis here, as the LLM acts like someone who speaks through a particularly good translation service, but otherwise doesn't understand the language spoken.

>>infect+Na
I'm all for flashy in academic sense, because we can let engineers sort out the practical aspects, especially by combining flashy academic approach. The flaw from LLM architecture can be predicted from the original paper, no amount of engineering can compensate that.

>>mnky98+N3
Especially ironic considering he's neither a developer nor a PhD. He's the smooth talking "MBA idea guy looking for a technical cofounder" type that's frequently decried on HN.

>>ehnto+18
The real missing the forest for the trees is thinking that software and the way users will use computers is going to remain static.

Software today is written to accommodate every possible need of every possible user, and then a bunch of unneeded selling point features on top of that. These massive sprawling code bases made to deliver one-size fits all utility.

I don't need 3 million LOC Excel 365 to keep track of who is working on the floor on what day this week. Gemini 2.5 can write an applet that does that perfectly in 10 minutes.

replies(2): >>ehnto+Nu >>Bughea+I12

>>zeroon+i7
As a non-programmer non-software engineer, the programs I can write with modern SOTA models are at least 5x larger than the ones GPT-4 could make.

LLMs are like bumpers on bowling lanes. Pro bowlers don't get much utility from them. Total noobs are getting more and more strikes as these "smart" bumpers get better and better at guiding their ball.

>>margor+(OP)
> to roll the dice

This was discussed here

>>43988913

>>owebma+44
There is the huge blind spot where tech workers think LLMs are being made primarily to either assist them or replace them.

Nobody seems to consider that LLMs are democratizing programming, and allowing regular people to build programs that make their work more efficient. I can tell you that at my old school manufacturing company, where we have no programmers and no tech workers, LLMs have been a boon for creating automation to bridge gaps and even to forgo paid software solutions.

This is where the change LLMs will bring will come from. Not from helping an expert dev write boilerplate 30% faster.

replies(1): >>dttze+FG

>>Workac+Yp
I don't believe it will remain static, in fact it's done nothing but change every year for my entire career.

I do like the idea of smaller programs fitting smaller needs being easy to access for everyone, and in my post history you would see me advocate for bringing software wages down so that even small businesses can have software capabilities in house. Software has so much to give to society outside of big VC flips and tech monoliths. Maybe AI is how we get there in the end.

But I think that supplanting humans with an AI workforce in the very near future might be stretching the projection of its capabilities too far. LLMs will be augmenting how businesses operate from now and into the future, but I am seeing clear roadblocks that make an autonomous AI agent unviable, and it seems to be fundamental limitations of LLMs, eg continuity and context. Advances recently seem to be from supplemental systems that try to patch those limitations. That suggests those limits are tricky, and until a new approach shows up, that is what drives my lack of faith in an AI agent revolution.

But it is clear to me that I could be wrong, and it could be a spectacular miscalculation. Maybe the robots will make me eat my hat.

>>owebma+Ia
There's a pretty wide gulf between being one of the most important companies in the global marketplace as Microsoft, Apple, and Amazon are and "having no economic impact".

I agree that most of the AI companies describe themselves and their products in hyperbolic terms. But that doesn't mean we need to counter that with equally absurd opposing hyperbole.

replies(1): >>owebma+dG

>>derekt+g9
And what is their burn rate? Everyone fails to mention the amount they are spending for this return.

>>derekt+lv
There is no hyperbole. I think AI will change the world in the next 10 years but comparing to the iphone, for example, 3 years the economic impact was much, much bigger and that is just one brand of smartphones.

>>Workac+Zr
Low code/no code/visual programming has been around forever. They all had issues. LLMs will also have the same issues and cost even more.

replies(1): >>Workac+IA1

>>noneth+P4
There are several in the linked post, primarily:

"Your code does not compile" and "Your tests fail"

If you have to tell an intern that more than once on a single task, there's going to be conversations.

>>derekt+g9
Revenue, not profit.

If it costs them even just one more dollar than that revenue number to provide that service (spoiler, it does), then you could say AI has had no positive economic impact.

Considering we know they’re being subsidized by obscene amounts of investment money just like all other frontier model providers, it seems pretty clear it’s still a negative economic impact, regardless of the revenue number.

>>eterev+C
Without handholding (aka being used as a tool by a competent programmer instead of as an independent “agent”), they’re currently significantly worse than an intern.

>>dttze+FG
I'm not aware of any that you speak/type plain English to.

replies(1): >>saati+P32

>>Workac+Yp
I don't know. I guess it depends on what you classify as being change. I don't really view software as having changed all that much since around maybe the mid 70s as HLLs began to become more popular. What programmers do today and what they did back then would be easily recognizable to both groups if we had time machines. I don't see how AI really changes things all that much. It's got the same scalability issues that low code/no code solutions have always had and those go way back. The main difference is that you can use natural language, but I don't see that as being inherently better than say drawing a picture using some flowcharting tools in a low code platform. You just introduce the same problem natural languages always have had and why we didn't choose them in the first place, i.e. they are not strict enough and need lots of context. Giving an AI very specific sentences to define my project in natural language and making sure it has lots of context begins to look an awful lot like psuedocode to me. So as you learn to approach using AI in such a way that it produces what you want you naturally get closer and closer to just specifying the code.

What HAS indisputably changed is the cost of hardware which has driven accessibility and caused more consumer facing software to be made.

>>Workac+IA1
You never heard of COBOL? It's original premise was you can now use something resembling English to write programs.