zlacker

My iPhone 16 Pro Max produces garbage output when running MLX LLMs

submitted by rafael+(OP) on 2026-02-01 20:51:56 | 428 points 176 comments
[view article] [source] [go to bottom]

NOTE: showing posts with links only show all posts
3. rainco+8h[view] [source] 2026-02-01 23:08:02
>>rafael+(OP)
Low level numerical operation optimizations are often not reproduceable. For example: https://www.intel.com/content/dam/develop/external/us/en/doc... (2013)

But it's still surprising that that LLM doesn't work on iPhone 16 at all. After all LLMs are known for their tolerance to quantization.

◧◩◪
15. macint+pr[view] [source] [discussion] 2026-02-02 00:32:45
>>sen+Dm
I haven't watched the video, but clearly there's a broad problem with the iOS keyboard recently.

>>46232528 ("iPhone Typos? It's Not Just You - The iOS Keyboard is Broken")

◧◩◪
17. varun_+ns[view] [source] [discussion] 2026-02-02 00:41:52
>>Vorpal+Th
built-in calculator apps are surprisingly underbaked... I'm surprised neither of the big two operating systems have elected to ship something comparable to a real calculator built in. It would be nice if we could preview the whole expression as we type it..

I use the NumWorks emulator app whenever I need something more advanced. It's pretty good https://www.numworks.com/simulator/

◧◩
21. JimboO+3z[view] [source] [discussion] 2026-02-02 01:44:25
>>csmant+vp
(This is a total digression, so apologies)

My mind instantly answered that with "bright", which is what you get when you combine the sun and moon radicals to make 明(https://en.wiktionary.org/wiki/%E6%98%8E)

Anyway, that question is not without reasonable answers. "Full Moon" might make sense too. No obvious deterministic answer, though, naturally.

◧◩
46. realit+KG[view] [source] [discussion] 2026-02-02 03:00:09
>>Button+dg
GraphNCalc83 is awesome [0].

[0] https://apps.apple.com/us/app/graphncalc83/id744882019

◧◩
65. waters+JV[view] [source] [discussion] 2026-02-02 05:46:20
>>Button+dg
PCalc -- because it runs on every Apple platform since the Mac Classic:

https://pcalc.com/mac/thirty.html

My other favorite calculator is free42, or its larger display version plus42

https://thomasokken.com/plus42/

For a CAS tool on a pocket mobile device, I haven't found anything better than MathStudio (formerly SpaceTime):

https://mathstud.io

You can run that in your web browser, but they maintain a mobile app version. It's like a self-hosted Wolfram Alpha.

◧◩◪◨
68. SauntS+YW[view] [source] [discussion] 2026-02-02 06:01:55
>>christ+AC
Reminds me of this AI word combination game recently shared on HN, with almost exactly these mechanics:

https://neal.fun/infinite-craft/

For the record, Sun+Moon is indeed eclipse.

72. docfor+vZ[view] [source] 2026-02-02 06:28:14
>>rafael+(OP)
Interesting post, but the last bit of logic pointing to the Neural Engine for MLX doesn’t hold up. MLX supports running on CPU, Apple GPU via Metal, and NVIDIA GPU via CUDA: https://github.com/ml-explore/mlx/tree/main/mlx/backend
◧◩◪◨⬒⬓⬔⧯▣▦
74. ekelse+U21[view] [source] [discussion] 2026-02-02 07:10:39
>>addaon+hY
I also don't have access to the spec, but the people writing Rust do and they claim this: "IEEE makes almost no guarantees about the sign and payload bits of the NaN"

https://rust-lang.github.io/rfcs/3514-float-semantics.html

See also this section of wikipedia https://en.wikipedia.org/wiki/NaN#Canonical_NaN

"On RISC-V, most floating-point operations only ever generate the canonical NaN, even if a NaN is given as the operand (the payload is not propagated)."

And from the same article:

"IEEE 754-2008 recommends, but does not require, propagation of the NaN payload." (Emphasis mine)

I call bullshit on the statement "specifically binary operations combining two NaN inputs must result in one of the input NaNs." It is definitely not in the spec.

◧◩
81. idk1+Aa1[view] [source] [discussion] 2026-02-02 08:33:41
>>csmant+vp
As an aside, one of my very nice family members like tarot card reading, and I think you'd get an extremely different answer for - "What's moon plus sun?" - something like I would guess as they're opposites - "Mixed signals or insecurity get resolved by openness and real communication." - It's kind of fascinating, the range of answers to that question. As a couple of other people have mentioned, it could mean loads of things. I thought I'd add one in there.

I'll just add that if you think this advice applies to you, it's the - https://en.wikipedia.org/wiki/Barnum_effect

◧◩◪
91. DavidV+Gm1[view] [source] [discussion] 2026-02-02 10:41:28
>>bri3d+Dh
I would go even further and state that "you should never assume that floating point functions will evaluate the same on two different computers, or even on two different versions of the same application", as the results of floating point evaluations can differ depending on platform, compiler optimizations, compilation-flags, run-time FPU environment (rounding mode, &c.), and even memory alignment of run-time data.

There's a C++26 paper about compile time math optimizations with a good overview and discussion about some of these issues [P1383]. The paper explicitly states:

1. It is acceptable for evaluation of mathematical functions to differ between translation time and runtime.

2. It is acceptable for constant evaluation of mathematical functions to differ between platforms.

So C++ has very much accepted the fact that floating point functions should not be presumed to give identical results in all circumstances.

Now, it is of course possible to ensure that floating point-related functions give identical results on all your target machines, but it's usually not worth the hassle.

[P1383]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p13...

99. zcbenz+Ys1[view] [source] 2026-02-02 11:46:08
>>rafael+(OP)
It is a bug in MLX that has been fixed a few days ago: https://github.com/ml-explore/mlx/pull/3083
◧◩◪
105. syntax+lx1[view] [source] [discussion] 2026-02-02 12:25:31
>>embedd+Gt1
I don’t think so. You can see the issue ticket linked in the PR. Whether that issue ticket is related to the blog post is unknown https://github.com/ml-explore/mlx-swift-examples/issues/462
◧◩◪◨⬒
114. llm_ne+AH1[view] [source] [discussion] 2026-02-02 13:29:17
>>zozbot+SA1
Sure, I directly and explicitly talked about Apple's version of tensor cores in the GPU. But the ANE is by every definition a neural accelerator. Yes, I'm aware of Apple's weird branding for their tensor cores.

"In fact MLX does not even support ANE yet"

I didn't say otherwise. The ANE is a fantastic unit for small, power-efficient models, like extracting text from images, doing depth modelling, etc. It's not made for LLMs, or the other sorts of experimental stuff MLX is intended for. Though note that MLX's author's reason for not supporting the ANE is that it has a "closed-source" API (https://github.com/ml-explore/mlx/issues/18#issuecomment-184...), making it unsuitable for an open-source project, and given that MLX didn't want to just lean on CoreML. But anyways, the ANE is fantastically fast at what it does, while sipping juice.

In any case, the code change shown should have zero impact on the running of MLX on an iPhone 16 Pro. MLX tries to really leverage platform optimizations so maybe another bifucation is making the wrong choice.

◧◩◪
145. butlik+3B2[view] [source] [discussion] 2026-02-02 18:17:30
>>JimboO+3z
You could play Infinite Craft and find out what the game thinks it is: https://neal.fun/infinite-craft/

Edit: Spoiler -

It's 'Eclipse'

◧◩◪◨
160. jasinj+834[view] [source] [discussion] 2026-02-03 01:02:47
>>danpal+Mp
Huh. I never knew "champing" was the proper spelling [0]

[0] https://www.npr.org/sections/memmos/2016/06/09/605796769/che...

◧◩◪◨⬒
162. gryffy+7n4[view] [source] [discussion] 2026-02-03 03:26:07
>>joseph+251
Qalculate <https://qalculate.github.io/> is my favourite REPL-like calculator, although it unfortunately lacks an iOS app. It feels similar to using an HP 48-series calculator.

Numbat <https://numbat.dev/> is similar, but more CLI/REPL-focused, and with more of an emphasis on being a programming language.

◧◩◪◨⬒⬓⬔⧯▣▦
173. fragme+P99[view] [source] [discussion] 2026-02-04 11:19:58
>>throwa+lF6
That's still not theater though. Annoying? Yes, quite! But according to the definition:

> Security theater is the practice of implementing security measures that are considered to provide the feeling of improved security while doing little or nothing to achieve it.[1][2]

https://en.wikipedia.org/wiki/Security_theater#:~:text=Secur...

Just because it's annoying to you doesn't make it security theater.

[go to top]