My iPhone 16 Pro Max produces garbage output when running MLX LLMs

>>rafael+(OP)
It is a bug in MLX that has been fixed a few days ago: https://github.com/ml-explore/mlx/pull/3083

>>zcbenz+Ys1
So the underlying issue is that the iPhone 16 Pro SKU was misdetected as having Neural Accelerator (nax) support and this caused silently wrong results. Not a problem with the actual hardware.

>>zozbot+3v1
Apple's documentation is utter garbage, but this code almost seems like a separate issue (and notably the MLX library uses loads of undocumented properties in metal which isn't cool). It looks like the change used to allow the NAX kernel to be used on the iPhone 17 or upcoming 18 if you're on 26.2 or later, to instead only allow it on the iPhone 17 Pro or upcoming 18. I'm fairly sure the GPU arch on the A19 is 17. They changed it so it will only use that kernel on the 17 Pro or upcoming 18, which is notable as the A19 Pro in the 17 Pro has a significantly changed GPU, including GPU tensor cores. The only real change here is that it would limit to the pro variants for the "17" model.

>>llm_ne+Rz1
> The neural accelerator exists in iPhones going back many years.

What has existed before is the Apple Neural Engine (ANE) which is very different from the newer Neural Accelerator support within the GPU blocks. In fact MLX does not even support ANE yet since at least in previous versions it was hardware-limited to computing FP16 and INT8 MADDs, and not even that fast.

zlacker