>>matt_d+(OP)
QM would tell us the order of your Hamiltonian (attention operator) doesn’t limit the complexity of the wave function (hidden state). It might be more efficient to explicitly correlate certain many-body interactions, but pair-wise interactions, depth and a basis (hidden state dimension) approaching completeness "are all you need”.