zlacker

> It's entirely clear that every last human will be beaten on code design in the upcoming years

Citation needed. In fact, I think this pretty clearly hits the "extraordinary claims require extraordinary evidence" bar.

replies(7): >>coffee+u >>kaliqt+0h >>sweezy+ih >>mark_l+bt >>Arthur+MR >>numpad+1Y1 >>aposm+xD2

>>DanHul+(OP)
AlphaGo.

replies(1): >>giovan+c6

>>coffee+u
A board game has a much narrower scope than programming in general.

replies(1): >>cft+bb

>>giovan+c6
Thus this was in 2016. 9 years have passed.

replies(1): >>astran+ul

>>DanHul+(OP)
Trends would dictate that this will keep scaling and surpass each goalpost year by year.

>>DanHul+(OP)
I would argue that what LLMs are capable of doing right now is already pretty extraordinary, and would fulfil your extraordinary evidence request. To turn it on its head - given the rather astonishing success of the recent LLM training approaches, what evidence do you have that these models are going to plateau short of your own abilities?

replies(4): >>sigmai+Sh >>gmm199+Ox >>sampul+aI >>namari+3l1

>>sweezy+ih
What they do is extraordinary, but it's not just a claim, they actually do, their doing so is evidence.

Here someone just claimed that it is "entirely clear" LLMs will become super-human, without any evidence.

https://en.wikipedia.org/wiki/Extraordinary_claims_require_e...

replies(1): >>sweezy+Ei

>>sigmai+Sh
Again - I'd argue that the extraordinary success of LLMs, in a relatively short amount of time, using a fairly unsophisticated training approach, is strong evidence that coding models are going to get a lot better than they are right now. Will it definitely surpass every human? I don't know, but I wouldn't say we're lacking extraordinary evidence for that claim either.

The way you've framed it seems like the only evidence you will accept is after it's actually happened.

replies(2): >>sigmai+4j >>davidc+Tp

>>sweezy+Ei
Well, predicting the future is always hard. But if someone claims some extraordinary future event is going to happen, you at least ask for their reasons for claiming so, don't you.

In my mind, at this point we either need (a) some previously "hidden" super-massive source of training data, or (b) another architectural breakthrough. Without either, this is a game of optimization, and the scaling curves are going to plateau really fast.

replies(1): >>sweezy+cl

>>sigmai+4j
A couple of comments

a) it hasn't even been a year since the last big breakthrough, the reasoning models like o3 only came out in September, and we don't know how far those will go yet. I'd wait a second before assuming the low-hanging fruit is done.

b) I think coding is a really good environment for agents / reinforcement learning. Rather than requiring a continual supply of new training data, we give the model coding tasks to execute (writing / maintaining / modifying) and then test its code for correctness. We could for example take the entire history of a code-base and just give the model its changing unit + integration tests to implement. My hunch (with no extraordinary evidence) is that this is how coding agents start to nail some of the higher-level abilities.

replies(1): >>sigmai+oF1

>>cft+bb
LLMs and AlphaGo don't work at all similarly, since LLMs don't use search.

I think everyone expected AlphaGo to be the research direction to pursue, which is why it was so surprising that LLMs turned out to work.

>>sweezy+Ei
This is like Disco Stu's chart for disco sales on the Simpsons or the people who were guaranteeing bitcoin would be $1 million each in 2020

replies(1): >>sweezy+691

>>DanHul+(OP)
I recently asked o4-mini-high for a system design of something moderately complicated and provided only about 4 paragraphs of prompt for what I wanted. I thought the design was very good, as was the Common Lisp code it wrote when I asked it to implement the design; one caveat though: it did a much better job implementing the design in Python than Common Lisp (where I had to correct the generated code).

My friend, we are living in a world of exponential increase of AI capability, at least for the last few years - who knows what the future will bring!

replies(1): >>gtirlo+Gz

>>sweezy+ih
I think it’s glorified copying of existing libraries/code. The number of resources already dedicated to the field and the amount of hype around the technology make me wary that it will get better at more comprehensive code design.

>>mark_l+bt
That's your extraordinary evidence?

replies(1): >>mark_l+vA

>>gtirlo+Gz
Nope, just my opinion, derived from watching monthly and weekly exponential improvement over a few year period. I worked through at least two,AI winters since 1982, so current progress is good to see.

replies(1): >>namari+5m1

>>sweezy+ih
I agree that they can do extraordinary things already, but have a different impression of the trajectory. I don't think it's possible for me to provide hard evidence, but between GPT2 and 3.5 I felt that there was an incredible improvement, and probably would have agreed with you at that time.

GPT4 was another big improvement, and was the first time I found it useful for non-trivial queries. 4o was nice, and there was decent bump with the reasoning models, especially for coding. However, since o1 it's felt a lot more like optimization than systematic improvement, and I don't see a way for current reasoning models to advance to the point of designing and implementing medium+ coding projects without the assistance of a human.

Like the other commenter mention, I'm sure it will happen eventually with architectural improvements, but I wouldn't bet on 1-5 years.

>>DanHul+(OP)
Beating humans isnt really what matters. Its enabling developers to design who cant.

Last month I had a staff member design and build a distributed system that would be far beyond their capabilities without AI assistance. As a business owner this allows me to reduce the dependency and power of the senior devs.

replies(2): >>auggie+RS >>mrheos+FU

>>Arthur+MR
Hehe, have fun with that distributed system down the line.

replies(1): >>Arthur+xU

>>auggie+RS
Why? We fully checked the design, what he built, and it was fully tested over weeks for security and stability.

Don't parrot what you read online that these systems are unable do this stuff. It's from the clueless or devs coping. Not only are they capable but theyre improving by the month.

replies(2): >>goatlo+XV >>auggie+g11

>>Arthur+MR
"With great power comes great responsibility"

Does that junior dev take responsibility when that system breaks ?

replies(1): >>Arthur+zV

>>mrheos+FU
Its his and his managers product, so yes. We don't care if they code it, don't code it, whether an AI builds it or a cheap Indian. Theyre still responsible.

>>Arthur+xU
I can't tell on this site who has genuinely experienced radical changes in software development from dedicated LLM usage, and who is trying to sell something. But given previous hype cycles with all exciting new tech at the time, including past iterations of AI, I tend to believe it's more in the trying to sell something camp.

replies(1): >>Arthur+nW

>>goatlo+XV
Well, youre right to be skeptical because the majority of "AI" going on is hype designed for the purposes of either a scam, getting easy investment funds or inflating company valuations.

But.. the capabilities (and rate of progression) of these top tier LLMs isn't hype.

>>Arthur+xU
Oh, they are definitely capable, I am using them every day, and build my own MCP servers. But you cannot test a distributed system "fully". The only test I believe in is understanding every single line of code myself, or knowing that somebody else does. At this point, I don't trust the AI for anything, although it makes a very valuable assistant.

Very soon our AI built software systems will break down in spectacular and never before seen ways, and I'll have the product to help with that.

replies(1): >>Arthur+v51

>>auggie+g11
I have no idea why you think you can't test a distributed system. Hopefully you are not in the business of software development. You certainly wouldnt be working at my company.

Secondly, people are not just blindly having AI write code with no idea how it works. The AI is acting as a senior consultant helping the developer to design and build the systems and generating parts of the code as they work together.

replies(2): >>auggie+b61 >>mossch+0h3

>>Arthur+v51
Well, and I wouldn't buy anything your company produces, as you cannot even interpret my statements properly.

>>davidc+Tp
I'm not betting any money here - extrapolation is always hard. But just drawing a mental line from here that tapers to somewhere below one's own abilities - I'm not seeing a lot of justification for that either.

>>sweezy+ih
On Limitations of the Transformer Architecture https://arxiv.org/abs/2402.08164

Theoretical limitations of multi-layer Transformer https://arxiv.org/abs/2412.02975

replies(1): >>sweezy+2n1

>>mark_l+vA
Exponential over which metric exactly? Training dataset size, compute required yeah these have grown exponentially. But has any measure capability?

Because exponentially growing costs with linear or not measurable improvements is not a great trajectory.

replies(1): >>mark_l+Qt2

>>namari+3l1
Only skimmed, but both seem to be referring to what transformers can do in a single forward pass, reasoning models would clearly be a way around that limitation.

o4 has no problem with the examples of the first paper (appendix A). You can see its reasoning here is also sound: https://chatgpt.com/share/681b468c-3e80-8002-bafe-279bbe9e18.... Not conclusive unfortunately since this is in date-range of its training data. Reasoning models killed off a large class of "easy logic errors" people discovered from the earlier generations though.

replies(1): >>namari+wld

>>sweezy+cl
the "reasoning" models are already optimization, not a breakthrough.

They are not reasoning in any real sense, they are writing pages and pages of text before giving you the answer. This is not super-unlike the "ever bigger training data" method, just applied to output instead of input.

>>DanHul+(OP)
We were all crazy hyped when NVIDIA demoed end-to-end self driving, weren't we? First order derivatives of a hype cycle curve at lower X values is always extremely large but it's not so useful. At large X it's obviously obvious. It's always had been that way.

>>namari+5m1
Exponential in how useful LLM APIs and LLM based products like Google AI Lab, ChatGPT, etc. are to me personally. I am the data point I care about. I have a pet programming problem that every few months I try to solve with the current tools of the day. I admit this is anecdotal, just my personal experiences.

Metrics like training data set size are less interesting now given the utility of smaller synthetic data sets.

Once AI tech is more diffused to factory automation, robotics, educational systems, scientific discovery tools, etc., then we could measure efficiency gains.

My personal metric for the next 5 to 10 years: the US national debt and interest payments are perhaps increasing exponentially and since nothing will change politically to change this, exponential AI capability growth will either juice-up productivity enough to save us economically, or it won’t.

replies(1): >>gtirlo+u83

>>DanHul+(OP)
I had a coworker making very similar claims recently - one of the more AI-positive engineers on my team (a big part of my department's job is assessing new/novel tech for real-world value vs just hype). I was stunned when I actually saw the output of this process, which was a multi-page report describing the architecture of an internal system that arguably needed an overhaul. I try to keep an open mind, but this report was full of factual mistakes, misunderstandings, and when it did manage to accurately describe aspects of this system's design/architecture, it made only the most surface-level comments about boilerplate code and common idioms, without displaying any understanding of the actual architecture or implications of the decisions being made. Not only this coworker but several other more junior engineers on my team proclaimed this to be an example of the amazing advancement of AI ... which made me realize that the people claiming that LLMs have some superhuman ability to understand and design computer systems are those who have never really understood it themselves. In many cases these are people who have built their careers on copying and pasting code snippets from stack overflow, etc., and now find LLMs impressive because they're a quicker and easier way to do the same.

>>mark_l+Qt2
I think you're using words like "exponential" and "exponentially" as intensifiers and not in the mathematical sense, right? People are engaging in discussions with you expecting numbers to back your claims because of that.

>>Arthur+v51
I'm very confused by this. I have in no way seen AI that can act as a senior consultant to any professional software engineer. I work with AI all the time and am not doubting that it is very useful, but this seems like dreaming to me. It frequently gets confused and doesn't understand the bigger picture, particularly when large contexts are involved. Solving small problems it is often helpful but I can't imagine how anyone could believe it is in any way a replacement for a senior engineer in its current form.

>>sweezy+2n1
Your unwillingness to engage with the limitations of the technology explains a lot of the current hype.