zlacker

For one reason or another everyone seems to be sleeping on Gemini. I have been exclusively using Gemini 3 Flash to code these days and it stands up right alongside Opus and others while having a much smaller, faster and cheaper footprint. Combine it with Antigravity and you're basically using a cheat code.

replies(20): >>catlov+91 >>jckahn+M1 >>raluse+K2 >>satvik+R2 >>whalee+53 >>qaq+45 >>mfro+E7 >>pRusya+B9 >>bastaw+Ca >>Curiou+Ya >>OsrsNe+dc >>psyclo+qj >>notato+Po >>TZubir+3t >>TheAce+nu >>codazo+Jw >>jonath+hy >>aantix+sN >>jug+261 >>zrn900+mz5

>>paxys+(OP)
It's ok, but it too frequently edits WAY more than it needs to in order to accomplish the task at hand.

GPT-5.2 sometimes does this too. Opus-4.5 is the best at understanding what you actually want, though it is ofc not perfect.

>>paxys+(OP)
Yeah I don't understand why everyone seems to have forgotten about the Gemini options. Antigravity, Jules, and Gemini CLI are as good as the alternatives but are way more cost effective. I want for nothing with my $20/mo Google AI plan.

replies(3): >>paxys+02 >>codazo+yE >>riku_i+eX

>>jckahn+M1
Yeah I'm on the $20/mo Google plan and have been rate limited maybe twice in 2 months. Tried the equivalent Claude plan for a similar workload and lasted maybe 40 minutes before it asked me to upgrade to Max to continue.

replies(1): >>lelant+PI2

>>paxys+(OP)
I think Gemini is an excellent model, it's just not a particularly great agent. One of the reasons is that its code output is often structured in a way that looks like it's answering a question, rather than generating production code. It leaves comments everywhere, which are often numbered (which not only is annoying, but also only makes sense if the numbering starts within the frame of reference of the "question" it's "answering").

It's also just not as good at being self-directed and doing all of the rest of the agent-like behaviors we expect, i.e. breaking down into todolists, determining the appropriate scope of work to accomplish, proper tool calling, etc.

replies(2): >>freedo+F3 >>sutter+me

>>paxys+(OP)
Eh, it's not near Opus at all, closer to Sonnet. It is nice though with Antigravity because it's free versus being paid in other IDEs like Cursor.

replies(1): >>causal+Sg

>>paxys+(OP)
I think counter to the assumption of myself (and many), for long form agent coding tasks, models are not as easily hot swappable as I thought.

I have developed decent intuition on what kinds of problems Codex, Claude, Cursor(& sub-variants), Composer etc. will or will not be able to do well across different axes of speed, correctness, architectural taste, ...

If I had to reflect on why I still don't use Gemini, it's because they were late to the party and I would now have to be intentional about spending time learning yet another set of intuitions about those models.

replies(1): >>codazo+3F

>>raluse+K2
Yeah, you may have nailed it. Gemini is a good model, but in the Gemini CLI with a prompt like, "I'd like to add <feature x> support. What are my options? Don't write any code yet" it will proceed to skip right past telling me my options and will go ahead an implement whatever it feels like. Afterward it will print out a list of possible approaches and then tell you why it did the one it did.

Codex is the best at following instructions IME. Claude is pretty good too but is a little more "creative" than codex at trying to re-interpret my prompt to get at what I "probably" meant rather than what I actually said.

replies(3): >>michae+ki >>phaino+IO >>Pantal+UQ

>>paxys+(OP)
Maybe it's the types of projects I work on but Gemini is basically unusable to me. Settled on Claude Code for actual work and Codex for checking Claude's work. If I try to mix in Gemini it will hallucinate issues that do not exist in code at very high rate. Claude and Codex are way more accurate at finding issues that actually exist.

>>paxys+(OP)
For me it just depends on the project. Sometimes one or the other performs better. If I am digging into something tough and I think it's hallucinating or misunderstanding, I will typically try another model.

>>paxys+(OP)
It's the opposite experience for me. Gemini mostly produces made up and outdated stuff.

>>paxys+(OP)
I've never, ever had a good experience with Gemini (3 Pro). It's been embarrassingly bad every time I've tried it, and I've tried it lots of times. It overcomplicates almost everything, hallucinates with impressive frequency, and needs to be repeatedly nudged to get the task fully completed. I have no reason to continue attempting to use it.

replies(1): >>JoshMa+Pw

>>paxys+(OP)
Oddly enough, as impressive as Gemini 3 is, I find myself using it infrequently. The thing Gemini 2.5 had over the other models was dominance in long context, but GPT5.2-codex-max and Opus 4.5 Thinking are decent at long context now, and collectively they're better at all the use cases I care about.

>>paxys+(OP)
For all the hype I see about Gemini, we integrated it with our product (an AI agent) and it consistently performs worse[0] than Claude Sonnet, Opus, and ChatGPT 5.2

[0] based on user Thumbs up/Thumbs down voting

>>raluse+K2
My go-to models have been Claude and Gemini for a long time. I have been using Gemini for discussions and Claude for coding and now as an agent. Claude has been the best at doing what I want to do and not doing what I don’t want to do. And then my confidence in it took a quantum leap with Opus 4.5. Gemini seems like it has gotten even worse at doing what I want with new releases.

>>satvik+R2
Yeah use Flash 3 for easy + fast stuff, but it can't hold the plot like Opus or Codex 5

>>freedo+F3
Can you (or anyone) explain how this might be? The "agent" is just a passthrough for the model, no? How is one CLI/TUI tool better than any other, given the same model that it's passing your user input to?

I am familiar with copilot cli (using models from different providers), OpenCode doing the same, and Claude with just the \A models, but if I ask all 3 the same thing using the same \A model, I SHOULD be getting roughly the same output, modulo LLM nondeterminism, right?

replies(1): >>taylor+7Z

>>paxys+(OP)
I tried to use it, kept saying it was at max capacity and nothing would happen. I gave it a good day before giving up.

>>paxys+(OP)
I can think of one major reason why Microsoft and Apple would prefer to feed their codebases into Claude than to Gemini.

>>paxys+(OP)
I don't think anyone is sleeping on it.

It's on the top of most leaderboards on lmarena.ai

>>paxys+(OP)
This comment is a bit confusing and surprising to me because I tried Antigravity three weeks ago and it was very undercooked. Claude was actually able to identify bugs and get the bigger picture of the project, while Gemini 3 with Antigravity often kept focusing on unimportant details.

My default everyday model is still Gemimi 3 in AI Studio, even for programming related problems. But for agentic work Antigravity felt very early-stages beta-ware when I tried it.

I will say that at least Gemimi 3 is usually able to converge on a correct solution after a few iterations. I tried Grok for a medium complexity task and it quickly got stuck trying to change minor details without being able to get itself out.

Do you have any advice on how to use Antigravity more effectively? I'm open to trying it again.

replies(3): >>paxys+XD >>Analem+aK >>8note+Nr2

>>paxys+(OP)
I've used Gemini CLI a fair amount as well—it's included with our subscription at work. I like it okay, but it tends to produce "lies" a bit too often. It tends to produce language that reads as over confident that it's found a problem or solution. This causes me extra work to verify or causes me extra time because I believed it. In my experience Claude Code does this quite a bit less.

>>bastaw+Ca
Same. Sometimes even repeated nudges don't help. The underlying 3.0 Pro model is great to talk and ideate with, but its inability to deliver within the Gemini CLI harness is ... almost comical.

>>paxys+(OP)
I'm also using Gemini and it's the only option that consistently works for me so far. I'm using it in chat mode with copy&paste and it's pleasant to work with.

Both Claude and ChatGPT were unbearable, not primarily because of lack of technical abilities but because of their conversational tone. Obviously, it's pointless to take things personally with LLMs but they were so passive-aggressive and sometimes maliciously compliant that they started to get to me even though I was conscious of it and know very well how LLMs work. If they had been new hires, I had fired both of them within 2 weeks. In contrast, Gemini Pro just "talks" normally, task-oriented and brief. It also doesn't reply with files that contain changes in completely unrelated places (including changing comments somewhere), which is the worst such a tool could possibly do.

Edit: Reading some other comments here I have to add that the 1., 2. ,3. numbering of comments can be annoying. It's helpful for answers but should be an option/parameterization.

replies(2): >>boness+bT >>lelant+gK2

>>TheAce+nu
Ask it to verify stuff in the browser. It can open a special Chrome instance, browse URLs, click and scroll around, inspect the DOM, and generally do whatever it takes to verify that the problem is actually solved, or it will go back and iterate more. That feedback loop IMO makes it very powerful for client-side or client-server development.

replies(1): >>sherlo+Mx3

>>jckahn+M1
It's crazy that we're having such different experiences. I purchased the Google AI plan as an alternative to my ChatGPT (Codex) daily driver. I use Gemini a fair amount at work, so I thought it would be a good choice to use personally. I used it a few times but ran into limits the first few projects I worked on. As a result I switched to Claude and so, far, I haven't hit any limits.

>>whalee+53
I feel like "prompting language" doesn't translate over perfectly either. It's like we become experts at operating a particular AI agent.

I've been experimenting with small local models and the types of prompts you use with these are very different than the ones you use with Claude Code. It seems less different between Claude, Codex, and Gemini but there are differences.

It's hard to articulate those differences but I think that I kind of get in a groove after using models for a while.

>>TheAce+nu
I've mentioned this before, but I think Gemini is the smartest raw model for answering programming questions in chatbot mode, but these CC/Codex/gemini-cli tools need more than just the model, the harness has to be architected intelligently and I think that's where Google is behind for the moment.

>>paxys+(OP)
Not my experience at all.

It fails to be pro-active. "Why didn't you run the tests you created?"

I want it to tell me if the implementation is working.

Feels lazy. And it hallucinates solutions frequently.

It pales in comparison to CC/Opus.

replies(1): >>zhengy+ZP

>>freedo+F3
Try the conductor extension for gemini-cli: https://github.com/gemini-cli-extensions/conductor

It won't make any changes until a detailed plan is generated and approved.

>>aantix+sN
I feel like this is exactly the use case for things like Hooks and Skills. Which, if you don't want to write them yourself, I get it. But I do think we can get the tool to do it; sounds like you want it doing that a little more actively/out-of-the-box?

>>freedo+F3
I've had the exact opposite experience. After including in my prompt "don't write any code yet" (or similar brief phrase), Gemini responds without writing code.

Using Gemini 2.5 or 3, flash.

>>jonath+hy
I think you’re highlighting an aspect of agentic coding that’s undervalued: what to do once trust is breached… ?

With humans you can categorically say ‘this guy lies in his comments and copy pastes bullshit everywhere’ and treat them consistently from there out. An LLM is guessing at everything all the time. Sometimes it’s copying flawless next-level code from Hacker News readers, sometimes it’s sabotaging your build by making unit tests forever green. Eternal vigilance is the opposite of how I think of development.

>>jckahn+M1
Google has uncertain privacy settings, there is no declaration they won't train their LLM on your personal/commercial code.

replies(1): >>Zopieu+212

>>michae+ki
maybe different preparatory "system" prompts?

>>paxys+(OP)
I've heard Opus 4.5 might have an edge especially in long running agentic coding scenarios (?) but personally yes Gemini 3 series is what I was expecting GPT-5 to be.

I'm also mostly on Gemini 3 Flash. Not because I've compared them all and I found it the best bar none, but because it fulfills my needs and then some, and Google has a surprisingly little noted family plan for it. Unlike OpenAI, unlike Anthropic. IIRC it's something like 5 shared Gemini Pro subs for the price of 1. Even being just a couple sharing it, it's a fantastic deal. My wife uses it during studies, I professionally with coding and I've never run into limits.

>>riku_i+eX
https://macaron.im/blog/ai-assistant-privacy-comparison#:~:t...

All providers are opt-out. The moat is the data, don't pretend like you don't know.

replies(1): >>riku_i+172

>>Zopieu+212
per my previous research there is no opt out for gemini cli.

>>TheAce+nu
gemini flash in their claude code compatitor does pretty well if you give it alternative tools.

the tools its built with seem to suck, but it can cook with serena mcp.

the flash models seem to get better results than the pro ones as far as ive seen, but theres not a big difference

>>paxys+02
> Yeah I'm on the $20/mo Google plan and have been rate limited maybe twice in 2 months. Tried the equivalent Claude plan for a similar workload and lasted maybe 40 minutes before it asked me to upgrade to Max to continue.

The TLDR: The $20/40m cost is more reflective of what inference actually costs, including the amortised cost of the Capex, together with the Opex.

The Long Read:

I think the reason is because Anthropic is attempting to run inference at a profit and Google isn't.

Another reason could be that they don't own their cost centers (GPUs are from Nvidia, Cloud instances are from AWS, data centers from AWS, etc); they own only the model but rent everything else needed for inference so pay a margin for all those rented cost centers.

Google owns their entire vertical (GPUs are google-made, Cloud instances and datacenters are Google-owned, etc) and can apply vertical cost optimisations, so their final cost of inference is going to be much cheaper anyway even if they were not subsidising inference with their profits from unrelated business units.

replies(1): >>jckahn+Gf3

>>jonath+hy
> Both Claude and ChatGPT were unbearable, not primarily because of lack of technical abilities but because of their conversational tone.

It's pretty much trial and error.

I tried using ChatGPT via the webchat interface on Sunday and it was so terse and to the point that it was basically useless. I had to repeatedly prompt for all the hidden details that I basically gave up and used a different webchat LLM (I regularly switch between ChatGPT, Claude, Grok and Gemini).

When I used it a month ago, it would point out potential footguns, flaws, etc. I suppose it just reinforces the point that "experience" gained using LLMs is mostly pointless, your experience gets invalidated the minute a model changes, or a system prompt changes, etc.

For most purposes, they are all mostly the same i.e. produce output so similar you won't notice a difference.

>>lelant+PI2
Well said.

It's for exactly this reason that I believe Google will win the AI race.

>>paxys+XD
How does that work? Do you have a link to documentation?

>>paxys+(OP)
What everyone is really sleeping on is Deepseek paid API with Cline and VSCode. An agent that can refactor entire codebases with a 128.0k context window that costs dimes. It generates entire blocks of code and tests them for $0.02 a pop. Deepseek paid API brings the low cost large context window and memory. VSCode the interface, CLine the agent.