zlacker

[parent] [thread] 6 comments
1. helloh+(OP)[view] [source] 2026-02-04 17:22:34
I'm not saying if the paper is correct or not (since I can't tell), but I don't think your argument really holds. Consider applying it to multiplication:

Fundamentally, multiplication need to look at every pair of integer from the two input numbers. It must be O(n^2); N digits looking at N other digits is quadratic. Any sub-quadratic multiplication must hence necessarily lose some information.

replies(4): >>action+c9 >>logicc+GZ >>nine_k+8A1 >>sifar+PB1
2. action+c9[view] [source] 2026-02-04 18:00:25
>>helloh+(OP)
Doesn't that have to do with how many bits you allow in the actual calculation in physical reality?
replies(1): >>helloh+zo
◧◩
3. helloh+zo[view] [source] [discussion] 2026-02-04 19:03:27
>>action+c9
Well, for multiplication complexity is defined in terms of on the number of digits/bits digits directly. For attention, complexity is defined on terms of the number of input vectors which are all at fixed precision. I don't understand what happens to the method proposed in the paper at higher precision (since I don't understand the paper), but in reality in doesn't matter since there is no value in anything over float16 for machine learning.
4. logicc+GZ[view] [source] 2026-02-04 22:00:52
>>helloh+(OP)
Multiplication has some properties like being cumulative. If we assume the sequence has any specific properties then we no longer have a general sequence model.
replies(1): >>direwo+ji1
◧◩
5. direwo+ji1[view] [source] [discussion] 2026-02-04 23:45:32
>>logicc+GZ
I think you meant commutative.

Attention also has some specific properties.

And sometimes results are just unexpected. Did you know that anything a Turing machine can do in t tome steps, a different Turing machine can do in O(sqrt(t log t)) memory cells? >>44055347

6. nine_k+8A1[view] [source] 2026-02-05 02:02:30
>>helloh+(OP)
Integer multiplication x * y can be trivially done in O(k): k = log₂(min(x, y)). This is because we can do addition in constant time, adding all bits in parallel.

By combining many more adding units, we can do (fixed-size) multiplication in constant time, too: https://en.wikipedia.org/wiki/Dadda_multiplier

7. sifar+PB1[view] [source] 2026-02-05 02:16:15
>>helloh+(OP)
Multiplication can be sub-quadratic using Karatsuba's algorithm.
[go to top]