The advantage rather for llms in strongly typed languages is that compilers can catch errors early and give the model early automated feedback so you don’t have to.
With weakly typed (and typically interpreted) languages they will need to run the code which maybe quite slow to do so or not realistic.
Simply put agentic coding loops prefer stronger static analysis capabilities.
The only problem I’ve ever had was on maybe 3 total occasions it’s added a return statement, I assume because of the syntax similarity with ruby
also, some nonstatic languages have a habit of having least surprise in their codebases -- it's often possible to effectively guess the types flowing through at the callsite. zero refactoring feedback necessary is better than even one.
But those are exactly the same mistakes most humans make when writing bash scripts, which makes them inherently flaky.
Ask it to write code in a language with types, a “logical” syntax where there are no tricky gotchas, with strict types, and a compiler which enforces those rules, and while LLMs struggle to begin with, they eventually produce code which is nearly clean and bug free. Works much better if there is an existing codebase where they can observe and learn from existing patterns.
On the other hand asking them to write JavaScript and Python, sure they fly, but they confidently implement code full of hidden bugs.
The whole “amount of training data” is completely overblown. I’ve seen code do well even with my own made up DSL. If the rules are logical and you explain the rules to it and show it existing patterns, the can mostly do alright. Conversely there is so much bad JavaScript and Python code in their training data that I struggle to get them to produce code in my style in these languages.