zlacker

[parent] [thread] 0 comments
1. fheins+(OP)[view] [source] 2026-02-04 18:47:49
Yes, there must be a connection. While adaptive truncation may prove impractical, it should be possible to measure spectral statistics on sample data, and specify a different fixed truncation order per layer, per head, etc. The github repository lists many other possible improvements: https://github.com/glassroom/sata_attention#proof-of-concept
[go to top]