FlashAttention-T: Towards Tensorized Attention

>>matt_d+(OP)
QM would tell us the order of your Hamiltonian (attention operator) doesn’t limit the complexity of the wave function (hidden state). It might be more efficient to explicitly correlate certain many-body interactions, but pair-wise interactions, depth and a basis (hidden state dimension) approaching completeness "are all you need”.

>>findal+MX
The terminology is overloaded.. Tensors in QM are objects obeying transformation laws, in ML Tensors are just data arranged in multidimensional arrays. There are no constraints on how the data transforms.

zlacker