zlacker

[parent] [thread] 0 comments
1. kioku+(OP)[view] [source] 2026-02-04 09:10:10
> Our key insight is to offload critical softmax primitives to idle tensor units, maximizing hardware utilization and throughput.

> … speedups of 1.05–1.17×across diverse attention configurations on Ampere and Hopper GPUs …

[go to top]