zlacker

[return to "Gemini 2.5 Pro Preview"]
1. segpha+J4[view] [source] 2025-05-06 15:34:48
>>meetpa+(OP)
My frustration with using these models for programming in the past has largely been around their tendency to hallucinate APIs that simply don't exist. The Gemini 2.5 models, both pro and flash, seem significantly less susceptible to this than any other model I've tried.

There are still significant limitations, no amount of prompting will get current models to approach abstraction and architecture the way a person does. But I'm finding that these Gemini models are finally able to replace searches and stackoverflow for a lot of my day-to-day programming.

◧◩
2. jstumm+jH[view] [source] 2025-05-06 19:23:17
>>segpha+J4
> no amount of prompting will get current models to approach abstraction and architecture the way a person does

I find this sentiment increasingly worrisome. It's entirely clear that every last human will be beaten on code design in the upcoming years (I am not going to argue if it's 1 or 5 years away, who cares?)

I wished people would just stop holding on to what amounts to nothing, and think and talk more about what can be done in a new world. We need good ideas and I think this could be a place to advance them.

◧◩◪
3. ssalaz+gg1[view] [source] 2025-05-06 23:55:42
>>jstumm+jH
I code with multiple LLMs every day and build products that use LLM tech under the hood. I dont think we're anywhere near LLMs being good at code design. Existing models make _tons_ of basic mistakes and require supervision even for relatively simple coding tasks in popular languages, and its worse for languages and frameworks that are less represented in public sources of training data. I am _frequently_ having to tell Claude/ChatGPT to clean up basic architectural and design defects. Theres no way I would trust this unsupervised.

Can you point to _any_ evidence to support that human software development abilities will be eclipsed by LLMs other than trying to predict which part of the S-curve we're on?

◧◩◪◨
4. xyzzy1+Cw1[view] [source] 2025-05-07 03:10:09
>>ssalaz+gg1
I can't point to any evidence. Also I can't think of what direct evidence I could present that would be convincing, short of an actual demonstration? I would like to try to justify my intuition though:

Seems like the key question is: should we expect AI programming performance to scale well as more compute and specialised training is thrown at it? I don't see why not, it seems an almost ideal problem domain?

* Short and direct feedback loops

* Relatively easy to "ground" the LLM by running code

* Self-play / RL should be possible (it seems likely that you could also optimise for aesthetics of solutions based on common human preferences)

* Obvious economic value (based on the multi-billion dollar valuations of vscode forks)

All these things point to programming being "solved" much sooner than say, chemistry.

◧◩◪◨⬒
5. energy+WH1[view] [source] 2025-05-07 05:43:37
>>xyzzy1+Cw1
This is my view. We've seen this before in other problems where there's an on-hand automatic verifier. The nature of the problem mirrors previously solved problems.

The LLM skeptics need to point out what differs with code compared to Chess, DoTA, etc from a RL perspective. I don't believe they can. Until they can, I'm going to assume that LLMs will soon be better than any living human at writing good code.

◧◩◪◨⬒⬓
6. klabb3+df2[view] [source] 2025-05-07 12:09:45
>>energy+WH1
> The LLM skeptics need to point out what differs with code compared to Chess, DoTA, etc from a RL perspective.

I see the burden of proof has been reversed. That’s stage 2 already of the hubris cycle.

On a serious note, these are nothing alike. Games have a clear reward function. Software architecture is extremely difficult to even agree on basic principles. We regularly invalidate previous ”best advice”, and we have many conflicting goals. Tradeoffs are a thing.

Secondly programming has negative requirements that aren’t verifiable. Security is the perfect example. You don’t make a crypto library with unit tests.

Third, you have the spec problem. What is the correct logic in edge cases? That can be verified but needs to be decided. Also a massive space of subtle decisions.

[go to top]