zlacker

[parent] [thread] 1 comments
1. Muffin+(OP)[view] [source] 2026-02-05 02:01:59
Thank you. Could you give a tl;dr on "the full model needs ____ this much VRAM and if you do _____ the most common quantization method it will run in ____ this much VRAM" rough estimate please?
replies(1): >>omneit+ej
2. omneit+ej[view] [source] 2026-02-05 05:07:23
>>Muffin+(OP)
It’s a trivial calculation to make (+/- 10%).

Number of params == “variables” in memory

VRAM footprint ~= number of params * size of a param

A 4B model at 8 bits will result in 4GB vram give or take, same as params. At 4 bits ~= 2GB and so on. Kimi is about 512GB at 4 bits.

[go to top]