Other architectures exist, but you can notice from the lack of people talking about them that they don't produce any output nearly as developed as the chatGPT kind. They will get there eventually, but that's not what we are seeing here.
What is that?
To get good output on larger scales we're going to need a model that is hierarchical with longer term self attention.