For example, Claude can fluently generate Bevy code as of the training cutoff date, and there's no way there's enough training data on the web to explain this. There's an agent somewhere in a compile test loop generating Bevy examples.
A custom LLM language could have fine grained fuzzing, mocking, concurrent calling, memoization and other features that allow LLMs to generate and debug synthetic code more effectively.
If that works, there's a pathway to a novel language having higher quality training data than even Python.
I wrote this custom language. It's on Github, but the example code that would have been available would be very limited.
I gave it two inputs -- the original bash script and an example of my pipeline language (unrelated jobs).
The code it gave me was syntactically correct, and was really close to the final version. I didn't have to edit very much to get the code exactly where I wanted it.
This is to say -- if a novel language is somewhat similar to an existing syntax, the LLM will be surprisingly good at writing it.