zlacker

[parent] [thread] 0 comments
1. verytr+(OP)[view] [source] 2026-02-04 00:10:53
Tldr: 5% - 17% speedup due to removing a bottleneck by juggling where on a GPU/compute core a computation is done during Flash attention.
[go to top]