zlacker

[parent] [thread] 1 comments
1. halcyo+(OP)[view] [source] 2026-02-03 18:42:51
What am I missing here? I thought this model needs 46GB of unified memory for 4-bit quant. Radeon RX 7900 XTX has 24GB of memory right? Hoping to get some insight, thanks in advance!
replies(1): >>coder5+T2
2. coder5+T2[view] [source] 2026-02-03 18:52:42
>>halcyo+(OP)
MoEs can be efficiently split between dense weights (attention/KV/etc) and sparse (MoE) weights. By running the dense weights on the GPU and offloading the sparse weights to slower CPU RAM, you can still get surprisingly decent performance out of a lot of MoEs.

Not as good as running the entire thing on the GPU, of course.

[go to top]