zlacker

[parent] [thread] 2 comments
1. mirekr+(OP)[view] [source] 2026-02-03 20:24:55
It takes download time + 1 minute to test speed yourself, you can try different quants, it's hard to write down a table because it depends on your system ie. ram clock etc. if you go out of gpu.

I guess it would make sense to have something like max context size/quants that fit fully on common configs with gpus, dual gpus, unified ram on mac etc.

replies(1): >>Keats+p8
2. Keats+p8[view] [source] 2026-02-03 21:06:00
>>mirekr+(OP)
Testing speed is easy yes, I'm mostly wondering about the quality difference between Q6 vs Q8_K_XL for example.
replies(1): >>daniel+0O
◧◩
3. daniel+0O[view] [source] [discussion] 2026-02-04 00:57:00
>>Keats+p8
I haven't done benchmarking yet (plan to do them), but it should be similar to our post on DeepSeek-V3.1 Dynamic GGUFs: https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs
[go to top]