zlacker

I've found, like you mentioned, that the tech stack you work with matters a lot in terms of successful results from LLMs.

Python is generally fine, as you've experienced, as is JavaScript/TypeScript & React.

I've had mixed results with C# and PowerShell. With PowerShell, hallucinations are still a big problem. Not sure if it's the Noun-Verb naming scheme of cmdlets, but most models still make up cmdlets that don't exist on the fly (though will correct itself once you correct it that it doesn't exist but at that point - why bother when I can just do it myself correctly the first time).

With C#, even with my existing code as context, it can't adhere to a consistent style, and can't handle nullable reference types (albeit, a relatively new feature in C#). It works, but I have to spend too much time correcting it.

Given my own experiences and the stacks I work with, I still won't trust an LLM in agent mode. I make heavy use of them as a better Google, especially since Google has gone to shit, and to bounce ideas off of, but I'll still write the code myself. I don't like reviewing code, and having LLMs write code for me just turns me into a full time code reviewer, not something I'm terribly interested in becoming.

I still get a lot of value out of the tools, but for me I'm still hesitant to unleash them on my code directly. I'll stick with the chat interface for now.

edit Golang is another language I've had problems relying on LLMs for. On the flip side, LLMs have been great for me with SQL and I'm grateful for that.

replies(1): >>neonsu+d7

>>theweb+(OP)
FWIW If you are using Github Copilot Edit/Agent mode - you may have more luck with other plugins. Until recently, Claude 3.5 Sonnet worked really well with C# and required relatively few extra commands to stay consistent to "newest tersest" style. But then, from my understanding, there was a big change in how Copilot extension handles attached context alongside changes around what I presume prompt and fine-tuning which resulted in severe degradation of the output quality. Hell, even attaching context data does not properly work 1 out of 3 times. But at least Gemini 2.5 Pro can write test semi-competently, but I still can't fathom how did the manage to make it so much worse!