zlacker
[parent]
[thread]
0 comments
1. kioku+(OP)
[view]
[source]
2026-02-04 09:10:10
> Our key insight is to offload critical softmax primitives to idle tensor units, maximizing hardware utilization and throughput.
> … speedups of 1.05–1.17×across diverse attention configurations on Ampere and Hopper GPUs …
[go to top]