My understanding/experience is that LLM performance in a language scales with how well the language is represented in the training data.
From that assumption, we might expect LLMs to actually do better with an existing language for which more training code is available, even if that language is more complex and seems like it should be “harder” to understand.
This does fill up context a little faster, (1) not as much as debugging the problem would have in a dynamic language, and (2) better agentic frameworks are coming that “rewrite” context history for dynamic on the fly context compression.
I still experience agents slipping in a `todo!` and other hacks to get code to compile, lint, and pass tests.
The loop with tests and doc tests are really nice, agreed, but it'll still shit out bad code.
I manage a team of interns and I don't have the energy to babysit an agent too. For me, gpt and gemini yield the best talk-it-through approach. For example, dropping a research paper into the chat and describing details until the implementation is clarified.
We also use Claude and Cursor, and that was an exceptionally disruptive experience. Huge, sweeping, wrong changes all over. Gyah! If I bitch about todo! macros, this is where they came from.
For hobby projects, I sometimes use whatever free agent microsoft is shilling via VS Code (and me selling my data) that day. This is relatively productive, but reaches profoundly wrong conclusions.
Writing for CLR in visual studio is the smoothest smart-complete experience today.
I have not touched Grok and likely won't.
/ two pennies
Hope that answers your questions.