>>edanm+(OP)
Probably because it doesn't maintain long term cohesion. Transformer models are great at producing things that look right over short distances, but as the output length increases it often becomes contradictory or nonsensical.
To get good output on larger scales we're going to need a model that is hierarchical with longer term self attention.