zlacker

[parent] [thread] 10 comments
1. ranger+(OP)[view] [source] 2026-02-03 16:27:45
What is the difference between the UD and non-UD files?
replies(1): >>daniel+E
2. daniel+E[view] [source] 2026-02-03 16:29:55
>>ranger+(OP)
UD stands for "Unsloth-Dynamic" which upcasts important layers to higher bits. Non UD is just standard llama.cpp quants. Both still use our calibration dataset.
replies(2): >>Camper+Ed >>ranger+Vo1
◧◩
3. Camper+Ed[view] [source] [discussion] 2026-02-03 17:24:32
>>daniel+E
Please consider authoring a single, straightforward introductory-level page somewhere that explains what all the filename components mean, and who should use which variants.

The green/yellow/red indicators for different levels of hardware support are really helpful, but far from enough IMO.

replies(2): >>daniel+ei >>segmon+vm
◧◩◪
4. daniel+ei[view] [source] [discussion] 2026-02-03 17:40:51
>>Camper+Ed
Oh good idea! In general UD-Q4_K_XL (Unsloth Dynamic 4bits Extra Large) is what I generally recommend for most hardware - MXFP4_MOE is also ok
replies(1): >>Keats+nI
◧◩◪
5. segmon+vm[view] [source] [discussion] 2026-02-03 17:57:00
>>Camper+Ed
The green/yellow/red indicators are based on what you set for your hardware on huggingface.
◧◩◪◨
6. Keats+nI[view] [source] [discussion] 2026-02-03 19:20:08
>>daniel+ei
Is there some indication on how the different bit quantization affect performance? IE I have a 5090 + 96GB so I want to get the best possible model but I don't care about getting 2% better perf if I only get 5 tok/s.
replies(1): >>mirekr+gX
◧◩◪◨⬒
7. mirekr+gX[view] [source] [discussion] 2026-02-03 20:24:55
>>Keats+nI
It takes download time + 1 minute to test speed yourself, you can try different quants, it's hard to write down a table because it depends on your system ie. ram clock etc. if you go out of gpu.

I guess it would make sense to have something like max context size/quants that fit fully on common configs with gpus, dual gpus, unified ram on mac etc.

replies(1): >>Keats+F51
◧◩◪◨⬒⬓
8. Keats+F51[view] [source] [discussion] 2026-02-03 21:06:00
>>mirekr+gX
Testing speed is easy yes, I'm mostly wondering about the quality difference between Q6 vs Q8_K_XL for example.
replies(1): >>daniel+gL1
◧◩
9. ranger+Vo1[view] [source] [discussion] 2026-02-03 22:47:35
>>daniel+E
What is your definition of "important" in this context?
replies(1): >>daniel+7L1
◧◩◪
10. daniel+7L1[view] [source] [discussion] 2026-02-04 00:56:09
>>ranger+Vo1
Oh we wrote about it here: https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs
◧◩◪◨⬒⬓⬔
11. daniel+gL1[view] [source] [discussion] 2026-02-04 00:57:00
>>Keats+F51
I haven't done benchmarking yet (plan to do them), but it should be similar to our post on DeepSeek-V3.1 Dynamic GGUFs: https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs
[go to top]