Qwen3-Coder-Next

>>daniel+(OP)
I got this running locally using llama.cpp from Homebrew and the Unsloth quantized model like this:

  brew upgrade llama.cpp # or brew install if you don't have it yet

Then:

  llama-cli \
    -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_XL \
    --fit on \
    --seed 3407 \
    --temp 1.0 \
    --top-p 0.95 \
    --min-p 0.01 \
    --top-k 40 \
    --jinja

That opened a CLI interface. For a web UI on port 8080 along with an OpenAI chat completions compatible endpoint do this:

  llama-server \
    -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_XL \
    --fit on \
    --seed 3407 \
    --temp 1.0 \
    --top-p 0.95 \
    --min-p 0.01 \
    --top-k 40 \
    --jinja

It's using about 28GB of RAM.

>>simonw+TH
what are your impressions?

>>techno+mb1
I got Codex CLI running against it and was sadly very unimpressed - it got stuck in a loop running "ls" for some reason when I asked it to create a new file.

>>simonw+wy1
You probably have seen it by now, but there was a llama.cpp issue that was fixed earlier today(?) to avoid looping and other sub-par results. Need to update llama-server as well as redownload the GGUFs (for certain quants).

https://old.reddit.com/r/unsloth/comments/1qvt6qy/qwen3coder...

zlacker