There are still significant limitations, no amount of prompting will get current models to approach abstraction and architecture the way a person does. But I'm finding that these Gemini models are finally able to replace searches and stackoverflow for a lot of my day-to-day programming.
I find this sentiment increasingly worrisome. It's entirely clear that every last human will be beaten on code design in the upcoming years (I am not going to argue if it's 1 or 5 years away, who cares?)
I wished people would just stop holding on to what amounts to nothing, and think and talk more about what can be done in a new world. We need good ideas and I think this could be a place to advance them.
Can you point to _any_ evidence to support that human software development abilities will be eclipsed by LLMs other than trying to predict which part of the S-curve we're on?
Seems like the key question is: should we expect AI programming performance to scale well as more compute and specialised training is thrown at it? I don't see why not, it seems an almost ideal problem domain?
* Short and direct feedback loops
* Relatively easy to "ground" the LLM by running code
* Self-play / RL should be possible (it seems likely that you could also optimise for aesthetics of solutions based on common human preferences)
* Obvious economic value (based on the multi-billion dollar valuations of vscode forks)
All these things point to programming being "solved" much sooner than say, chemistry.
Also, the reward functions that you mention don't necessarily lead to great code, only running code. The should be possible in the third bullet point does very heavy lifting.
At any rate, I can be convinced that LLMs will lead to substantially-reduced teams. There is a lot of junior-level code that I can let an LLM write and for non-junior level code, you can write/refactor things much faster than by hand, but you need a domain/API/design expert to supervise the LLM. I think in the end it makes programming much more interesting, because you can focus on the interesting problems, and less on the boilerplate, searching API docs, etc.
The problem with LLMs isn't that they can't do great stuff: it's that you can't trust them to do it consistently. Which means you have to verify what they do, which means you need domain knowledge.
Until the next big evolution in LLMs or a revolution from something else, we'll be alright.
Say you and I ask Gemini what the perfect internal temperature for a medium-rare steak is. It tells me 72c, and it tells you 55c.
Even if it tells 990 people 55c and 10 people 55c, with a tens to hundreds of million users that is still a gargantuan amount of ruined steaks.