Still, sad state of affairs that it seems like Apple is still fixing bugs based on what blog posts gets the most attention on the internet, but I guess once they started that approach, it's hard to stop and go back to figuring out priorities on their own.
I don't think that fix is specific to this, but it's absolutely true that MLX is trying to lever every advantage it can find on specific hardware, so it's possible it made a bad choice on a particular device.
What has existed before is the Apple Neural Engine (ANE) which is very different from the newer Neural Accelerator support within the GPU blocks. In fact MLX does not even support ANE yet since at least in previous versions it was hardware-limited to computing FP16 and INT8 MADDs, and not even that fast.
"In fact MLX does not even support ANE yet"
I didn't say otherwise. The ANE is a fantastic unit for small, power-efficient models, like extracting text from images, doing depth modelling, etc. It's not made for LLMs, or the other sorts of experimental stuff MLX is intended for. Though note that MLX's author's reason for not supporting the ANE is that it has a "closed-source" API (https://github.com/ml-explore/mlx/issues/18#issuecomment-184...), making it unsuitable for an open-source project, and given that MLX didn't want to just lean on CoreML. But anyways, the ANE is fantastically fast at what it does, while sipping juice.
In any case, the code change shown should have zero impact on the running of MLX on an iPhone 16 Pro. MLX tries to really leverage platform optimizations so maybe another bifucation is making the wrong choice.
I almost guarantee there is no way they can read this blogpost, escalate it internally, get the appropriate approval to the work item, actually work on the fix, get it through QA and get it live in production in 3 days. That would only happen on really critical issues, and this is definitely not critical enough for that.
The MLX folks have various rationales for not supporting the ANE (at least as of yet), but one of them is that any real support requires implementing explicit splits in the graph of computations, where ANE-suitable portions are to be dispatched to the ANE and everything else goes back to the GPUs. That's not necessarily trivial.
If not, talk about coincident that someone reported an issue and all of that you mentioned was already done before that happened, and the only thing missing was merging the code to the repository which was done after the issue was reported. Not unheard of, but feels less unlikely than "Engineer decided to fix it".
I've seen a blog-post, authored a bug in Radar, assigned it to myself, and fixed it the same day. Whether it goes out in the next release is more a decision for the bug-review-board, but since the engineering manager (that would have been me) sits on that too, it's just a matter of timing and seeing if I can argue the case.
To be fair, the closer we are to a release, the less likely a change is to be accepted unless you can really sweet-talk the rest of the BRB, and there's usually a week of baking before the actual release goes out, but that has sometimes been shrunk for developer-preview releases...
The conclusion, that it was not the fault of the developer was correct, but assuming anything other than a problem at some point in the software stack is unreasonable.
You're being unfair here. The showpiece software that uses that hardware wouldn't install, and almost all software ignores it.
I highly doubt that you could have a usable iPhone with a broken neural engine, at the very least it would be obvious to the user that there is something very wrong going on.
Aah, the old "you're holding it wrong" defense.
All neural accelerator hardware models and all neural accelerator software stacks output slightly different results. That is a truth of the world.
The same is true for GPUs and 3d rendering stacks too.
We don't usually notice that, because the tasks themselves tolerate those minor errors. You can't easily tell the difference between an LLM that had 0.00001% of its least significant bits perturbed one way and one that had them perturbed the other.
But you could absolutely construct a degenerate edge case that causes those tiny perturbances to fuck with everything fiercely. And very rarely, this kind of thing might happen naturally.
>And very rarely, this kind of thing might happen naturally.
It is not a question of rarity, it is a question of the stability of the numerical problem. Luckily most of the computation in an LLM is matrix multiplication, which is s extremely well understood numerical problem and which can be checked for good condition.
Two different numerical implementations on a well conditioned problem and which requires much computation, differing significantly would indicate a disastrous fault in the design or condition of the hardware, which would be noticed by most computations done on that hardware.
If you weigh the likelihood of OP running into a hardware bug, causing significant numerical error on one specific computational model against the alternative explanation of a problem in the software stack it is clear that the later explanation is orders of magnitude more likely. Finding a single floating point arithmetic hardware bug is exceedingly rare (although Intel had one), but stacking them up in a way in which one particular neural network does not function, while other functions on the hardware run perfectly fine, is astronomically unlikely.
The releases are the things that are few and far between - generally though, a nominated daily-build (based on the pre-determined release schedule) is triaged and tested by QA and engineering for a while before release, and then ... it's out there...
...Unless something goes unexpectedly wrong with the nomination, anyway. That's pretty rare because builds are constantly being made, regressions identified, and new bugs discovered and earmarked as "must-fix" (or whatever) on a daily basis. B&I have a fairly good feel for how things are going at any given time.
It's really just timing. If you can squeeze another fix in before the cutoff deadline for the nominated build, you're in. If not, you wait until the next one, which can be a while...