zlacker

[return to "Qwen3-Coder-Next"]
1. simonw+TH[view] [source] 2026-02-03 18:51:23
>>daniel+(OP)
I got this running locally using llama.cpp from Homebrew and the Unsloth quantized model like this:

  brew upgrade llama.cpp # or brew install if you don't have it yet
Then:

  llama-cli \
    -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_XL \
    --fit on \
    --seed 3407 \
    --temp 1.0 \
    --top-p 0.95 \
    --min-p 0.01 \
    --top-k 40 \
    --jinja
That opened a CLI interface. For a web UI on port 8080 along with an OpenAI chat completions compatible endpoint do this:

  llama-server \
    -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_XL \
    --fit on \
    --seed 3407 \
    --temp 1.0 \
    --top-p 0.95 \
    --min-p 0.01 \
    --top-k 40 \
    --jinja
It's using about 28GB of RAM.
◧◩
2. techno+mb1[view] [source] 2026-02-03 21:03:08
>>simonw+TH
what are your impressions?
◧◩◪
3. simonw+wy1[view] [source] 2026-02-03 23:05:28
>>techno+mb1
I got Codex CLI running against it and was sadly very unimpressed - it got stuck in a loop running "ls" for some reason when I asked it to create a new file.
◧◩◪◨
4. Camper+WN5[view] [source] 2026-02-05 02:47:35
>>simonw+wy1
You probably have seen it by now, but there was a llama.cpp issue that was fixed earlier today(?) to avoid looping and other sub-par results. Need to update llama-server as well as redownload the GGUFs (for certain quants).

https://old.reddit.com/r/unsloth/comments/1qvt6qy/qwen3coder...

[go to top]