They measure the old gemini 2.5 generating proper diffs 92% of the time. I bet this goes up to ~95-98% https://aider.chat/docs/leaderboards/
Question for the google peeps who monitor these threads: Is gemini-2.5-pro-exp (free tier) updated as well, or will it go away?
Also, in the blog post, it says:
> The previous iteration (03-25) now points to the most recent version (05-06), so no action is required to use the improved model, and it continues to be available at the same price.
Does this mean gemini-2.5-pro-preview-03-25 now uses 05-06? Does the same apply to gemini-2.5-pro-exp-03-25?update: I just tried updating the date in the exp model (gemini-2.5-pro-exp-05-06) and that doesnt work.
I don't have a formal benchmark but there's a notable improvement in code generation due to this alone.
I've had gemini chug away on plans that have taken ~1 hour to implement. (~80mln tokens spent) A good portion of that energy was spent fixing mistakes made by cline/aider/roo due to search/replace mistakes. If this model gets anywhere close to 100% on diffs then this is a BFD. I estimate this will translate to a 50-75% productivity boost on long context coding tasks. I hope the initial results i'm seeing hold up!
I'm surprised by the reaction in the rest of the thread. A lot unproductive complaining, a lot of off topic stuff, nothing talking about the model itself.
Any thoughts from anyone else using the updated model?