Can you point to _any_ evidence to support that human software development abilities will be eclipsed by LLMs other than trying to predict which part of the S-curve we're on?
it becomes a question of how much you believe it's all just training data, and how much you believe the LLM's got pieces that are composable. I've given the question on the link as an interview questions and had humans been unable to give as through an answer (which I chose to believe is due to specialization on elsewhere in the stack). So we're already at a place where some human software development abilities have been eclipsed on some questions. So then even if the underlying algorithms don't improve, and they just ingest more training data, then it doesn't seem like a total guess as to what part of the S-curve we're on - the number of questions for software development that LLMs are able to successfully answer will continue to increase.
Seems like the key question is: should we expect AI programming performance to scale well as more compute and specialised training is thrown at it? I don't see why not, it seems an almost ideal problem domain?
* Short and direct feedback loops
* Relatively easy to "ground" the LLM by running code
* Self-play / RL should be possible (it seems likely that you could also optimise for aesthetics of solutions based on common human preferences)
* Obvious economic value (based on the multi-billion dollar valuations of vscode forks)
All these things point to programming being "solved" much sooner than say, chemistry.
The LLM skeptics need to point out what differs with code compared to Chess, DoTA, etc from a RL perspective. I don't believe they can. Until they can, I'm going to assume that LLMs will soon be better than any living human at writing good code.
An obviously correct automatable objective function? Programming can be generally described as converting a human-defined specification (often very, very rough and loose) into a bunch of precise text files.
Sure, you can use proxies like compilation success / failure and unit tests for RL. But key gaps remain. I'm unaware of any objective function that can grade "do these tests match the intent behind this user request".
Contrast with the automatically verifiable "is a player in checkmate on this board?"
Also, the reward functions that you mention don't necessarily lead to great code, only running code. The should be possible in the third bullet point does very heavy lifting.
At any rate, I can be convinced that LLMs will lead to substantially-reduced teams. There is a lot of junior-level code that I can let an LLM write and for non-junior level code, you can write/refactor things much faster than by hand, but you need a domain/API/design expert to supervise the LLM. I think in the end it makes programming much more interesting, because you can focus on the interesting problems, and less on the boilerplate, searching API docs, etc.
So, it doesn't map cleanly onto previously solved problems, even though there's a decent amount of overlap. But I'd like to add a question to this discussion:
- Can we design clever reward models that punish bad architectural choices, executing on unclear intent, etc? I'm sure there's scope beyond the naive "make code that maps input -> output", even if it requires heuristics or the like.
Look, I'm sure focusing on inputs instead of outcomes (not even outputs) will work out great for you. Good luck!
Personally I'm in a software company where this new LLM wave didn't do much of a difference.
Seeing the evidence you're thinking of would mean that LLMs will have solved software development by next month.
Better start applying!
Cheaper organisations will be able to compete with you which couldn't before and will drive your revenue down.
I don't doubt you are successful, but the mentality and value hierarchy you seem to express here is something I never want to have anything to do with.
Your developers weren't just a cost but also a barrier to entry.
I too use multiple LLMs every day to help with my development work. And I agree with this statement. But, I also recognize that just when we think that LLMs are hitting a ceiling, they turn around and surprise us. A lot of progress is being made on the LLMs, but also on tools like code editors. A very large number of very smart people are focused on this front and a lot of resources are being directed here.
If the question is:
Will the LLMs get good at code design in 5 years?
I think the answer is:
Very likely.
I think we will still need software devs, but not as many as we do today.
You can’t just train a model on the 1000 github repos that are very well coded.
Smart people or not, LLM require input. Or it’s garbage in garbage out.
I don't doubt you have a functioning business, but I also wouldn't be surprised if you get overtaken some day.
I'm waiting for LLMs to integrate directly into programming languages.
The discussions sound a bit like the early days of when compilers started coming out, and people had been using direct assembler before. And then decades after, when people complained about compiler bugs and poor optimizers.
I see the burden of proof has been reversed. That’s stage 2 already of the hubris cycle.
On a serious note, these are nothing alike. Games have a clear reward function. Software architecture is extremely difficult to even agree on basic principles. We regularly invalidate previous ”best advice”, and we have many conflicting goals. Tradeoffs are a thing.
Secondly programming has negative requirements that aren’t verifiable. Security is the perfect example. You don’t make a crypto library with unit tests.
Third, you have the spec problem. What is the correct logic in edge cases? That can be verified but needs to be decided. Also a massive space of subtle decisions.
LLM sees pagination, it does pagination. After all LLM is an algorithm that calculates probability of the next word in a sequence of words, nothing less and nothing more. LLM does not think or feel, even though people believe in this saying thank you and using polite words like "please". LLM generates text on the base of what it was presented. That's why it will happily invent research that does not exist, create a review of a product that does not exist, invent a method that does not exist in a given programming language. And so on.
These heuristics are certainly "good enough" that Stockfish is able to beat the strongest humans, but it's rarely possible for a chess engine to determine if a position results in mate.
I guess the question is whether we can write a good enough objective function that would encapsulate all the relevant attributes of "good code".
"Nova: Generative Language Models for Assembly Code with Hierarchical Attention and Contrastive Learning"
Lolok. Neither do many using “AI” so what’s your point exactly?
It’s an odd thing to brag about being a dime a dozen “solutions” provider.
I'm more of an optimist in that regard. Yes, if you're looking at a very specific feature set/product that needs to be maintained/develop, you'll need less devs for that.
But we're going to see the Jevons Paradox with AI generated code, just as we've seen that in the field of web development where few people are writing raw HTML anymore.
It's going to be fun when nontechnical people who'd maybe know a bit of excel start vibe coding a large amount of software, some of which will succeed and require maintenance. This maintenance might not involve a lot of direct coding either, but a good understanding of how software actually works.
The field of software engineering might be doomed if everyone worked like this user and replaced programmers with machines, or not, but those are sort of above his paygrade. AI destroying the symbiotic relationship between IT companies and its internal social clubs is a societal issue, more macro-scale issues than internal regulation mechanisms of free market economies are expected to solve.
I guess my point is, I don't know this guy or his company is real or not, but it passes my BS detector and I know for the fact that a real medium sized company CEOs are like this. This is technically what everyone should aspire to be. If you think that's morally wrong and completely utterly wrong, congratulations for your first job.
There is already another reply referencing Jevons Paradox, so I won't belabor that point. Instead, let me give an analogy. Imagine programmers today are like scribes and monks of 1000 years ago, and are considering the impact of the printing press. Only 5% of the population knew how to read & write, so the scribes and monks felt like they were going to be replaced. What happened is the "job" of writing language will mostly go away, but every job will require writing as a core skill. I believe the same will happen with programming. A thousand years from now, people will have a hard time imagining jobs that don't involve instructing computers in some form (just like today it's hard for us to imagine jobs that don't involve reading/writing).
The problem with LLMs isn't that they can't do great stuff: it's that you can't trust them to do it consistently. Which means you have to verify what they do, which means you need domain knowledge.
Until the next big evolution in LLMs or a revolution from something else, we'll be alright.
I agree they improve productivity to where you need fewer developers for a similar quantity of output than before. But I dont think LLMs specifically will reduce the need for some engineer to do the higher level technical design and architecture work, just given what Ive seen and my understanding of the underlying tech.
My point actually has everything to do with making money. Making money is not a viable differentiator in and of itself. You need to put in work on your desired outcomes (or get lucky, or both) and the money might follow. My problem is that directives such as "software developers need to use tool x" is an _input_ with, at best, a questionable causal relationship to outcome y.
It's not about "social clubs for software developers", but about clueless execs. Now, it's quite possible that he's put in that work and that the outcomes are attributable to that specific input, but judging by his replies here I wouldn't wager on it. Also, as others have said, if that's the case, replicating their business model just got a whole lot easier.
> This is technically what everyone should aspire to be
No, there are other values besides maximizing utility.
What do you mean? How would this look like in your view?
Isn't this just a pot calling the kettle black? I'm not sure why either side has the rightful position of "my opinion is right until you prove otherwise".
We're talking about predictions for the future, anyone claiming to be "right" is lacking humility. The only think going on is people justifying their opinions, no one can offer "proof".
Say you and I ask Gemini what the perfect internal temperature for a medium-rare steak is. It tells me 72c, and it tells you 55c.
Even if it tells 990 people 55c and 10 people 55c, with a tens to hundreds of million users that is still a gargantuan amount of ruined steaks.
But yes, I also once worked at a company (Factset) where the CTO had to put a stop to something that got out of hand- A very popular game at the time basically took over the mindshare of most of the devs for a time, and he caught them whiteboarding game strategies during work hours. (It was Starcraft 1 or 2, I forget. But both date me at this point.) So he put out a stern memo. Which did halt it. And yeah, he was right to do that.
Just do me this favor- If a dev comes to you with a wild idea that you think is too risky to spend a normal workday on, tell them they can use their weekend time to try it out. And if it ends up working, give them the equivalent days off (and maybe an extra, because it sucks to burn a weekend on work stuff, even if you care about the product or service). That way, the bet is hedged on both sides. And then maybe clap them on the back. And consider a little raise next review round. (If it doesn't work out, no extra days off, no harm no foul.)
I think your attitude is in line with your position (and likely your success). I get it. Slightly more warmth wouldn't hurt, though.
But you're right though.
Total drivel. It is beyond question that the use of the tools increases the capabilities and output of every single developer in the company in whatever task they are working on, once they understand how to use them. That is why there is the directive.
Everything between landing a contract and transferring deliverables, for someone like him, is already questionably related to revenues. There's everything in software engineering to tie developer paychecks to values created, and it's still as reliable as medical advice from LLM at best. Adding LLMs into it probably won't look so risky to him.
> No, there are other values besides maximizing utility.
True, but again, above his paygrade as a player in a free market capitalist economy which is mere part of a modern society, albeit not a tiny part.
----
OT and might be weird to say: I think a lot of businesses would appreciate vibe-coding going forward, relative to a team of competent engineers, solely because LLMs are more consistent(ly bad). Code quality doesn't matter but consistency do; McDonald's basically dominates Hamburger market with the worst burger ever that is also by far the most consistent. Nobody loves it, but it's what sells.
Maybe you did, and as a developer I am sure it is more fun, easier, and enjoyable to work in those places. That isnt what we offer though. We offer something very simple. The opportunity for a developer to come in, work hard, probably not enjoy themselves, produce what we ask, to the standard we ask, and in return they get paid.
The core of these approaches are "self-play" which is where the "superhuman" qualities arise. The system plays billions of games against itself, and uses the data from those games to further refine itself. It seems that an automated "referee" (objective function) is an inescapable requirement for unsupervised self-play.
I would suggest that Stockfish and other older chess engines are not a good analogy for this discussion. Worth noting though that even Stockfish no longer uses a hand written objective function on extracted features like you describe. It instead uses a highly optimized neutral network trained on millions of positions from human games.
New expression to me, thanks.
But yes, and no. I’d agree in the sense that the null hypothesis is crucial, possible the main divider between optimists and pessimists. But I’ll still hold firm that the baseline should be predicting that transformer based AI differs from humans in ability since everything from neural architecture, training, and inference works differently. But most importantly, existing AI vary dramatically in ability across domains, where AI exceeds human ability in some and fail miserably in others.
Another way to interpret the advancement of AI is viewing it as a mirror directed at our neurophysiology. Clearly, lots of things we thought were different, like pattern matching in audio- or visual spaces, are more similar than we thought. Other things, like novel discoveries and reasoning, appear to require different processes altogether (or otherwise, we’d see similar strength in those, given that training data is full of them).
I’m not even talking about large codebases. It struggles to generate a valid ~400 LOC TypeScript file when that requires above-average type system knowledge. Try asking it to write a new-style decorator (added in 2023), and it mostly just hallucinates or falls back to the old syntax.
Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.
Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.
Please don't fulminate. Please don't sneer, including at the rest of the community.
Eschew flamebait
They fail at things requiring novel reasoning not already extant in its corpus, a sense of self, or an actual ability to continuously learn from experience, though those things can be programmed in manually as secondary, shallow characteristics.