zlacker

[parent] [thread] 1 comments
1. action+(OP)[view] [source] 2026-02-04 18:00:25
Doesn't that have to do with how many bits you allow in the actual calculation in physical reality?
replies(1): >>helloh+nf
2. helloh+nf[view] [source] 2026-02-04 19:03:27
>>action+(OP)
Well, for multiplication complexity is defined in terms of on the number of digits/bits digits directly. For attention, complexity is defined on terms of the number of input vectors which are all at fixed precision. I don't understand what happens to the method proposed in the paper at higher precision (since I don't understand the paper), but in reality in doesn't matter since there is no value in anything over float16 for machine learning.
[go to top]