zlacker

[return to "My iPhone 16 Pro Max produces garbage output when running MLX LLMs"]
1. rainco+8h[view] [source] 2026-02-01 23:08:02
>>rafael+(OP)
Low level numerical operation optimizations are often not reproduceable. For example: https://www.intel.com/content/dam/develop/external/us/en/doc... (2013)

But it's still surprising that that LLM doesn't work on iPhone 16 at all. After all LLMs are known for their tolerance to quantization.

◧◩
2. bri3d+Dh[view] [source] 2026-02-01 23:11:50
>>rainco+8h
Yes, "floating point accumulation doesn't commute" is a mantra everyone should have in their head, and when I first read this article, I was jumping at the bit to dismiss it out of hand for that reason.

But, what got me about this is that:

* every other Apple device delivered the same results

* Apple's own LLM silently failed on this device

to me that behavior suggests an unexpected failure rather than a fundamental issue; it seems Bad (TM) that Apple would ship devices where their own LLM didn't work.

◧◩◪
3. sva_+Ct[view] [source] 2026-02-02 00:53:06
>>bri3d+Dh
> floating point accumulation doesn't commute

It is commutative (except for NaN). It isn't associative though.

◧◩◪◨
4. ekelse+eD[view] [source] 2026-02-02 02:22:47
>>sva_+Ct
I think it commutes even when one or both inputs are NaN? The output is always NaN.
◧◩◪◨⬒
5. addaon+ND[view] [source] 2026-02-02 02:28:01
>>ekelse+eD
NaNs are distinguishable. /Which/ NaN you get doesn't commute.
◧◩◪◨⬒⬓
6. ekelse+nG[view] [source] 2026-02-02 02:55:13
>>addaon+ND
I guess at the bit level, but not at the level of computation? Anything that relies on bit patterns of nans behaving in a certain way (like how they propagate) is in dangerous territory.
◧◩◪◨⬒⬓⬔
7. addaon+OJ[view] [source] 2026-02-02 03:30:28
>>ekelse+nG
> Anything that relies on bit patterns of nans behaving in a certain way (like how they propagate) is in dangerous territory.

Why? This is well specified by IEEE 754. Many runtimes (e.g. for Javascript) use NaN boxing. Treating floats as a semi-arbitrary selection of rational numbers plus a handful of special values is /more/ correct than treating them as real numbers, but treating them as actually specified does give more flexibility and power.

[go to top]