They always hope the speed increase makes up for the lower quality, but it never does. The quadratic time seems inherent to the problem.
Indeed, there are lower bounds showing that sub n^2 algorithms can't work: https://arxiv.org/pdf/2302.13214
This paper at least aspires to reproduce 'true' attention, which distinguishes it from many of the others. TBD if its successful in that.