Nanolang: A tiny experimental language designed to be targeted by coding LLMs

>>Scramb+(OP)
It seems that something that does away with human friendly syntax and leans more towards a pure AST representation would be even better? Basically a Lisp but with very strict typing might do the trick. And most LLMs are probably trained on lots of Lisps already.

>>abraxa+dh
Generally seems a bad idea to have your LLM write languages you do not understand or write yourself

>>verdve+bv
Doesn’t that apply to the OP as well?

>>catlif+0z
Yes, I'm not going to fill my precious context with documentation for a programming language

This seems like a research dead end to me, the fundamentals are not there

>>verdve+uA
It seems kind of silly that you can’t teach an LLM new tricks though, doesn’t it? This doesn’t sound like an intrinsic limitation and more an artifact of how we produce model weights today.

>>catlif+TU
getting tricks embedded into the weights is expensive, it doesn't happen in a single pass

they's why we teach them new tricks on the fly (in-context learning) with instruction files

>>verdve+3M1
Right, it sounds like an artificial limitation.

>>catlif+oZ1
it's more a mathematical / algorithmic limitation

>>verdve+Ou2
I’ll counter it’s an architectural issue

>>catlif+RT3
I would put that under the umbrella of algo/math, i.e. the structure of the LLM is part of the algo, which is itself governed by math

For example, DeepSeek has done some interesting things with attention, via changes to the structures / algos, but all this is still optimized by gradient descent, which is why models do not learn facts and such from a single pass. It takes many to refine the weights that go into the math formulas

>>verdve+vs4
> I would put that under the umbrella of algo/math, i.e. the structure of the LLM is part of the algo, which is itself governed by math

Yes you’re right. I misspoke.

I’m curious if there are ways to get around the monolithic nature of today’s models. There have to be architectures where a generalized model can coordinate specialized models which are cheaper to train, for example. E.g calling into a tool which is actually another model. Pre-LLM this was called boosting or “ensemble of experts” (I’m sure I’m butchering some nuance there).

zlacker