In the fullness of time, you end up having to. Or at least I have. Which is why I always dislike additional layers and transforms at this point.
(eg. when I think about react native on android, I hear "now I'll have to be excellent at react/javascript and android/java/kotlin and C++ to be able to debug the bridge; not that "I can get away with just javascript".)
I'm not necessarily against the approach shown here, reducing tokens for more efficient LLM generation; but if this catches on, humans will read and write it, will write debuggers and tooling for it, etc. It will definitely not be a perfectly hidden layer underneath.
But why not, for programming models, just select tokens that map concisely existing programming languages ? Would that not be as effective ?