zlacker

[return to "FlashAttention-T: Towards Tensorized Attention"]
1. findal+MX[view] [source] 2026-02-04 03:30:30
>>matt_d+(OP)
QM would tell us the order of your Hamiltonian (attention operator) doesn’t limit the complexity of the wave function (hidden state). It might be more efficient to explicitly correlate certain many-body interactions, but pair-wise interactions, depth and a basis (hidden state dimension) approaching completeness "are all you need”.
◧◩
2. twothr+m51[view] [source] 2026-02-04 04:47:54
>>findal+MX
The terminology is overloaded.. Tensors in QM are objects obeying transformation laws, in ML Tensors are just data arranged in multidimensional arrays. There are no constraints on how the data transforms.
◧◩◪
3. findal+781[view] [source] 2026-02-04 05:12:29
>>twothr+m51
Intended as analogy - but it is essentially a description of the DMRG algorithm (quantum chem). Only pair-wise operators there but the theory approaches exact when there are enough terms in your tensor product (iterations ~ depth) and a large enough embedding dimension.

> There are no constraints on how the data transforms.

Except those implicit in your learned representation. And that representation could be the MB WF.

[go to top]