zlacker

[return to "Qwen3-Coder-Next"]
1. simonw+l3[view] [source] 2026-02-03 16:15:21
>>daniel+(OP)
This GGUF is 48.4GB - https://huggingface.co/Qwen/Qwen3-Coder-Next-GGUF/tree/main/... - which should be usable on higher end laptops.

I still haven't experienced a local model that fits on my 64GB MacBook Pro and can run a coding agent like Codex CLI or Claude code well enough to be useful.

Maybe this will be the one? This Unsloth guide from a sibling comment suggests it might be: https://unsloth.ai/docs/models/qwen3-coder-next

◧◩
2. embedd+R7[view] [source] 2026-02-03 16:33:40
>>simonw+l3
> I still haven't experienced a local model that fits on my 64GB MacBook Pro and can run a coding agent like Codex CLI or Claude code well enough to be useful

I've had mild success with GPT-OSS-120b (MXFP4, ends up taking ~66GB of VRAM for me with llama.cpp) and Codex.

I'm wondering if maybe one could crowdsource chat logs for GPT-OSS-120b running with Codex, then seed another post-training run to fine-tune the 20b variant with the good runs from 120b, if that'd make a big difference. Both models with the reasoning_effort set to high are actually quite good compared to other downloadable models, although the 120b is just about out of reach for 64GB so getting the 20b better for specific use cases seems like it'd be useful.

◧◩◪
3. gigate+Ob[view] [source] 2026-02-03 16:48:18
>>embedd+R7
I’ve a 128GB m3 max MacBook Pro. Running the gpt oss model on it via lmstudio once the context gets large enough the fans spin to 100 and it’s unbearable.
◧◩◪◨
4. embedd+Yt[view] [source] 2026-02-03 18:01:00
>>gigate+Ob
Yeah, Apple hardware don't seem ideal for LLMs that are large, give it a go with a dedicated GPU if you're inclined and you'll see a big difference :)
◧◩◪◨⬒
5. marci+AJ2[view] [source] 2026-02-04 08:45:34
>>embedd+Yt
Their issue with the mac was the sound of fans spinning. I doubt a dedicated gpu will resolved that.
[go to top]