zlacker

> The architecture behind the chatGPT and the other AIs that are making the news won't ever improve so it can correctly write non-trivial code. There is a fundamental reason for that.

What is that?

replies(1): >>Curiou+Os

>>edanm+(OP)
Probably because it doesn't maintain long term cohesion. Transformer models are great at producing things that look right over short distances, but as the output length increases it often becomes contradictory or nonsensical.

To get good output on larger scales we're going to need a model that is hierarchical with longer term self attention.