Qwen3-Coder-Next

>>daniel+(OP)
This GGUF is 48.4GB - https://huggingface.co/Qwen/Qwen3-Coder-Next-GGUF/tree/main/... - which should be usable on higher end laptops.

I still haven't experienced a local model that fits on my 64GB MacBook Pro and can run a coding agent like Codex CLI or Claude code well enough to be useful.

Maybe this will be the one? This Unsloth guide from a sibling comment suggests it might be: https://unsloth.ai/docs/models/qwen3-coder-next

>>simonw+l3
We need a new word, not "local model" but "my own computers model" CapEx based

This distinction is important because some "we support local model" tools have things like ollama orchestration or use the llama.cpp libraries to connect to models on the same physical machine.

That's not my definition of local. Mine is "local network". so call it the "LAN model" until we come up with something better. "Self-host" exists but this usually means more "open-weights" as opposed to clamping the performance of the model.

It should be defined as ~sub-$10k, using Steve Jobs megapenny unit.

Essentially classify things as how many megapennies of spend a machine is that won't OOM on it.

That's what I mean when I say local: running inference for 'free' somewhere on hardware I control that's at most single digit thousands of dollars. And if I was feeling fancy, could potentially fine-tune on the days scale.

A modern 5090 build-out with a threadripper, nvme, 256GB RAM, this will run you about 10k +/- 1k. The MLX route is about $6000 out the door after tax (m3-ultra 60 core with 256GB).

Lastly it's not just "number of parameters". Not all 32B Q4_K_M models load at the same rate or use the same amount of memory. The internal architecture matters and the active parameter count + quantization is becoming a poorer approximation given the SOTA innovations.

What might be needed is some standardized eval benchmark against standardized hardware classes with basic real world tasks like toolcalling, code generation, and document procesing. There's plenty of "good enough" models out there for a large category of every day tasks, now I want to find out what runs the best

Take a gen6 thinkpad P14s/macbook pro and a 5090/mac studio, run the benchmark and then we can say something like "time-to-first-token/token-per-second/memory-used/total-time-of-test" and rate this as independent from how accurate the model was.

>>kristo+R51
I don't even need "open weights" to run on hardware I own.

I am fine renting an H100 (or whatever), as long as I theoretically have access to and own everything running.

I do not want my career to become dependent upon Anthropic.

Honestly, the best thing for "open" might be for us to build open pipes and services and models where we can rent cloud. Large models will outpace small models: LLMs, video models, "world" models, etc.

I'd even be fine time-sharing a running instance of a large model in a large cloud. As long as all the constituent pieces are open where I could (in theory) distill it, run it myself, spin up my own copy, etc.

I do not deny that big models are superior. But I worry about the power the large hyperscalers are getting while we focus on small "open" models that really can't match the big ones.

We should focus on competing with large models, not artisanal homebrew stuff that is irrelevant.

>>echelo+lg1
> I do not want my career to become dependent upon Anthropic

As someone who switches between Anthropic and ChatGPT depending on the month and has dabbled with other providers and some local LLMs, I think this fear is unfounded.

It's really easy to switch between models. The different models have some differences that you notice over time but the techniques you learn in one place aren't going to lock you into a provider anywhere.

>>Aurorn+lt1
> It's really easy to switch between models. The different models have some differences that you notice over time but the techniques you learn in one place aren't going to lock you into a provider anywhere.

We have two cell phone providers. Google is removing the ability to install binaries, and the other one has never allowed freedom. All computing is taxed, defaults are set to the incumbent monopolies. Searching, even for trademarks, is a forced bidding war. Businesses have to shed customer relationships, get poached on brand relationships, and jump through hoops week after week. The FTC/DOJ do nothing, and the EU hasn't done much either.

I can't even imagine what this will be like for engineering once this becomes necessary to do our jobs. We've been spoiled by not needing many tools - other industries, like medical or industrial research, tie their employment to a physical location and set of expensive industrial tools. You lose your job, you have to physically move - possibly to another state.

What happens when Anthropic and OpenAI ban you? Or decide to only sell to industry?

This is just the start - we're going to become more dependent upon these tools to the point we're serfs. We might have two choices, and that's demonstrably (with the current incumbency) not a good world.

Computing is quickly becoming a non-local phenomenon. Google and the platforms broke the dream of the open web. We're about to witness the death of the personal computer if we don't do anything about it.

zlacker