zlacker

It is a bug in MLX that has been fixed a few days ago: https://github.com/ml-explore/mlx/pull/3083

replies(4): >>embedd+I >>zozbot+52 >>syntax+55 >>liuliu+f82

>>zcbenz+(OP)
Blog post dated 28 Jan 2026, the bug fix posted 29 Jan 2026, so I guess this story had a happy ending :)

Still, sad state of affairs that it seems like Apple is still fixing bugs based on what blog posts gets the most attention on the internet, but I guess once they started that approach, it's hard to stop and go back to figuring out priorities on their own.

replies(6): >>jckahn+42 >>syntax+n4 >>llm_ne+b7 >>dahcry+Tj >>mrtest+Uk >>rafael+ao

>>embedd+I
Just goes to show that attention is all you need.

replies(1): >>tensil+hv4

>>zcbenz+(OP)
So the underlying issue is that the iPhone 16 Pro SKU was misdetected as having Neural Accelerator (nax) support and this caused silently wrong results. Not a problem with the actual hardware.

replies(2): >>llm_ne+T6 >>TimByt+9l

>>embedd+I
I don’t think so. You can see the issue ticket linked in the PR. Whether that issue ticket is related to the blog post is unknown https://github.com/ml-explore/mlx-swift-examples/issues/462

>>zcbenz+(OP)
Kinda sucks how it seems like there’s no CI that runs on hardware.

>>zozbot+52
Apple's documentation is utter garbage, but this code almost seems like a separate issue (and notably the MLX library uses loads of undocumented properties in metal which isn't cool). It looks like the change used to allow the NAX kernel to be used on the iPhone 17 or upcoming 18 if you're on 26.2 or later, to instead only allow it on the iPhone 17 Pro or upcoming 18. I'm fairly sure the GPU arch on the A19 is 17. They changed it so it will only use that kernel on the 17 Pro or upcoming 18, which is notable as the A19 Pro in the 17 Pro has a significantly changed GPU, including GPU tensor cores. The only real change here is that it would limit to the pro variants for the "17" model.

replies(2): >>zozbot+U7 >>pjmlp+9o

>>embedd+I
MLX is a fairly esoteric library seeing very little usage, mostly to try to foment a broader NN space on Apple devices. This isn't something that is widely affecting people, and most people simply aren't trying to run general LLMs on their iPhone.

I don't think that fix is specific to this, but it's absolutely true that MLX is trying to lever every advantage it can find on specific hardware, so it's possible it made a bad choice on a particular device.

>>llm_ne+T6
> The neural accelerator exists in iPhones going back many years.

What has existed before is the Apple Neural Engine (ANE) which is very different from the newer Neural Accelerator support within the GPU blocks. In fact MLX does not even support ANE yet since at least in previous versions it was hardware-limited to computing FP16 and INT8 MADDs, and not even that fast.

replies(1): >>llm_ne+Ce

>>zozbot+U7
Sure, I directly and explicitly talked about Apple's version of tensor cores in the GPU. But the ANE is by every definition a neural accelerator. Yes, I'm aware of Apple's weird branding for their tensor cores.

"In fact MLX does not even support ANE yet"

I didn't say otherwise. The ANE is a fantastic unit for small, power-efficient models, like extracting text from images, doing depth modelling, etc. It's not made for LLMs, or the other sorts of experimental stuff MLX is intended for. Though note that MLX's author's reason for not supporting the ANE is that it has a "closed-source" API (https://github.com/ml-explore/mlx/issues/18#issuecomment-184...), making it unsuitable for an open-source project, and given that MLX didn't want to just lean on CoreML. But anyways, the ANE is fantastically fast at what it does, while sipping juice.

In any case, the code change shown should have zero impact on the running of MLX on an iPhone 16 Pro. MLX tries to really leverage platform optimizations so maybe another bifucation is making the wrong choice.

replies(1): >>zozbot+vk

>>embedd+I
I think you overestimate the power of a blogpost and the speed of bugfixing at Apple for something like this.

I almost guarantee there is no way they can read this blogpost, escalate it internally, get the appropriate approval to the work item, actually work on the fix, get it through QA and get it live in production in 3 days. That would only happen on really critical issues, and this is definitely not critical enough for that.

replies(3): >>embedd+nm >>spaced+Sv >>tensil+Sq4

>>llm_ne+Ce
The change's effects are dependent on what each SKU reports as its Metal architecture, both as identifying string (the equivalent to running 'metal-arch' in the Mac CLI) and as generation 'gen' number. Most likely you're misinterpreting the change as not affecting the iPhone 16 Pro, where in fact it does.

The MLX folks have various rationales for not supporting the ANE (at least as of yet), but one of them is that any real support requires implementing explicit splits in the graph of computations, where ANE-suitable portions are to be dispatched to the ANE and everything else goes back to the GPUs. That's not necessarily trivial.

>>embedd+I
How do you know that it wasn’t merely that the blog post elicited multiple people to file the same duplicate bug in Apple’s radar system, which is how they ostensibly prioritize fixes?

replies(1): >>embedd+xm

>>zozbot+52
From a debugging point of view, the author's conclusion was still completely reasonable given the evidence they had

replies(1): >>consta+R31

>>dahcry+Tj
Or, one of the developers of the library saw it, decided to fix it in their spare time (does that exist at Apple?) before it became a bigger thing.

If not, talk about coincident that someone reported an issue and all of that you mentioned was already done before that happened, and the only thing missing was merging the code to the repository which was done after the issue was reported. Not unheard of, but feels less unlikely than "Engineer decided to fix it".

>>mrtest+Uk
I don't, but the effect is the same, "something might land in the news, lets fix it before it does, since multiple people reporting the same issue based on this public post someone made".

>>llm_ne+T6
It used to be great, but those days are long gone, see the archived docs.

>>embedd+I
Extremely bad timing on my end then, should've waited for a few more days

>>dahcry+Tj
Three days is, agreed, too short. A week is just about possible, though...

I've seen a blog-post, authored a bug in Radar, assigned it to myself, and fixed it the same day. Whether it goes out in the next release is more a decision for the bug-review-board, but since the engineering manager (that would have been me) sits on that too, it's just a matter of timing and seeing if I can argue the case.

To be fair, the closer we are to a release, the less likely a change is to be accepted unless you can really sweet-talk the rest of the BRB, and there's usually a week of baking before the actual release goes out, but that has sometimes been shrunk for developer-preview releases...

replies(1): >>tensil+mt4

>>TimByt+9l
No it wasn't. A hardware defect so disastrous that it affects floating point computation on the neural engine, yet so minor that it does not affect any of the software on the device utilizing that hardware is exceedingly improbable.

The conclusion, that it was not the fault of the developer was correct, but assuming anything other than a problem at some point in the software stack is unreasonable.

replies(3): >>Dylan1+ui1 >>callme+Vs1 >>ACCoun+wD1

>>consta+R31
> yet so minor that it does not affect any of the software on the device utilizing that hardware

You're being unfair here. The showpiece software that uses that hardware wouldn't install, and almost all software ignores it.

replies(1): >>consta+Zl1

>>Dylan1+ui1
The hardware itself is utilized by many pieces of software on any Apple device. Face ID uses it, Siri uses it, the camera uses it, there are also other Apple on device LLM features, where you could easily test whether the basic capabilities are there.

I highly doubt that you could have a usable iPhone with a broken neural engine, at the very least it would be obvious to the user that there is something very wrong going on.

>>consta+R31
> The conclusion, that it was not the fault of the developer was correct, but assuming anything other than a problem at some point in the software stack is unreasonable.

Aah, the old "you're holding it wrong" defense.

replies(1): >>consta+Ty1

>>callme+Vs1
What do you mean? The developer is perfectly justified in being upset over a basic example not functioning correctly, due to bug on behalf of Apple's developers. It just wasn't reasonable to assume that the bug was due to malfunctioning hardware.

>>consta+R31
Nah.

All neural accelerator hardware models and all neural accelerator software stacks output slightly different results. That is a truth of the world.

The same is true for GPUs and 3d rendering stacks too.

We don't usually notice that, because the tasks themselves tolerate those minor errors. You can't easily tell the difference between an LLM that had 0.00001% of its least significant bits perturbed one way and one that had them perturbed the other.

But you could absolutely construct a degenerate edge case that causes those tiny perturbances to fuck with everything fiercely. And very rarely, this kind of thing might happen naturally.

replies(1): >>consta+GH1

>>ACCoun+wD1
You are correct that implementations of numerical functions in hardware differ, but I do not think you correctly understand the implications of this.

>And very rarely, this kind of thing might happen naturally.

It is not a question of rarity, it is a question of the stability of the numerical problem. Luckily most of the computation in an LLM is matrix multiplication, which is s extremely well understood numerical problem and which can be checked for good condition.

Two different numerical implementations on a well conditioned problem and which requires much computation, differing significantly would indicate a disastrous fault in the design or condition of the hardware, which would be noticed by most computations done on that hardware.

If you weigh the likelihood of OP running into a hardware bug, causing significant numerical error on one specific computational model against the alternative explanation of a problem in the software stack it is clear that the later explanation is orders of magnitude more likely. Finding a single floating point arithmetic hardware bug is exceedingly rare (although Intel had one), but stacking them up in a way in which one particular neural network does not function, while other functions on the hardware run perfectly fine, is astronomically unlikely.

replies(1): >>ACCoun+dL1

>>consta+GH1
I have seen meaningful instability happen naturally on production NNs. Not to a truly catastrophic degree, but, when you deal in 1024-bit vectors and the results vary by a couple bits from one platform to another, you tend to notice it. And if I've seen it get this bad, then, surely someone has seen worse.

>>zcbenz+(OP)
Why MLX doesn't just detect apple10 support (for Metal)? That excludes all the devices without NA.

>>dahcry+Tj
It would have to be a very serious security bug. Even then, unless they've totally upended their software development workflows in the past couple of years, the Apple I knew extremely well from the inside couldn't turn around a software fix this quickly, from PR to OS release, even if its existence depended on it. There's simply too much bureaucracy and process around submitting anything, no matter how vital.

>>spaced+Sv
The fixing of a bug at Apple is the easy and quick part. It's the submission process from then until it gets released as part of an OS update that is the ridiculously long (and too often difficult) part.

replies(1): >>spaced+5F5

>>jckahn+42
A statement which goes to show that confusing correlation with causation is all you need.

>>tensil+mt4
The submission process is pretty trivial too - as long as your code gets a PR review, and is given the green light to be merged into trunk (which is the 99% case, even if there are PR comments to address), it's going to be in the next daily build.

The releases are the things that are few and far between - generally though, a nominated daily-build (based on the pre-determined release schedule) is triaged and tested by QA and engineering for a while before release, and then ... it's out there...

...Unless something goes unexpectedly wrong with the nomination, anyway. That's pretty rare because builds are constantly being made, regressions identified, and new bugs discovered and earmarked as "must-fix" (or whatever) on a daily basis. B&I have a fairly good feel for how things are going at any given time.

It's really just timing. If you can squeeze another fix in before the cutoff deadline for the nominated build, you're in. If not, you wait until the next one, which can be a while...

replies(1): >>tensil+b67

>>spaced+5F5
Yeah, that was not at all my experience in CoreOS/SWE, where we would sometimes/often have to wait weeks for submissions to turn around in B&I to become part of "daily" builds. Glad you don't have to put up with the same ridiculous crap process.