zlacker

There are two ingredients that don't fit in the "attention-is-kernel-smoothing" as far as I can tell: positional encoding and causal masking (another way to say positional encoding, I guess)

Also, Simplical attention is pretty much what the OP was going for, but the hardware lottery is such that it's gonna be pretty difficult to get competitive in terms of engineering, not that people aren't trying (e.g. https://arxiv.org/pdf/2507.02754)