I hope that research into understanding LLM qualia eventually allow us to understand e.g. what it's like to [be a bat](https://en.wikipedia.org/wiki/What_Is_It_Like_to_Be_a_Bat%3F)
That's essentially the core idea in Coconut[1][2], to keep the reasoning traces in a continuous space.
[1] https://en.m.wikipedia.org/wiki/The_Unreasonable_Effectivene...
It is rather "unreasonable" to think we can explore the world simply through pen and paper, from the comfort of a chair. You'd think you'd need to go out and touch grass, but incredibly this is not necessary.
| The first point is that the enormous usefulness of mathematics in the natural sciences is something bordering on the mysterious and that there is no rational explanation for it. Second, it is just this uncanny usefulness of mathematical concepts that raises the question of the uniqueness of our physical theories.
Which is exactly why a lot of these other things are overused. Hamming's seems like an extension or corollary[1] and I even think Norvig's (Halevy's) is highly appropriate[2]. It is "unreasonable" to think these things would be effective. -------------------------------------
With this paper?I think is fine. It is being used in a similar way to Winger, with similar context.
I can see two camps. One has always interpreted the COT as analogous to a model's internal dialogue. While the other has always thought there's a much larger gap between the manipulations within latent representations and what has been decoded, not necessarily needing be strongly aligned.[3] To the former, the results here would be shocking, while to the latter it is "yes, and?" Clearly they're addressing the former camp. There were plenty of people that Winger did not need to convince.
I'm of the latter camp[4], and I'm happy people are not just asserting and are demonstrating. Honestly, I'm even frequently upset when works get dismissed because they "demonstrate something we already knew" but no one had ever actually demonstrated. The proofs and evidencing is more important than the answer. Quite often we're highly certain about results but they are difficult to even evidence (let alone prove). I mean it would be quite silly to dismiss a proof that P != NP, even though the vast majority of us have long been convinced that this is the relationship we'll end up with. Yet, no one's done it.
-------------------------------------
[0] https://web.archive.org/web/20210212111540/http://www.dartmo...[1] https://math.dartmouth.edu/~matc/MathDrama/reading/Hamming.h...
[2] https://static.googleusercontent.com/media/research.google.c...
[3] Both camps can be further broken down too. Lots of nuances and opinions here and the lines really get fuzzy as we try to make it more accurate. I don't want to pretend there's a hard defining line, but the distinction helps the discussion and I think is reasonably accurate enough. Let me know if you think it is a gross mischaracterization.
[4] I can expand more why this side seems "obvious" to me. But a warning, you can probably guess I'm not good at being terse.
[Note]: I'd even go so far as say we should revisit Winger's argument around AI. I'm certain mathematics can be and will be "unreasonably effective." But not enough time has been dedicated to formulate the right type of math to use. We really do have to invent a new kind here. This may sound weird to non-mathematicians, but even physics uses multiple kinds of mathematics. The operations, fields, and algebras you use in one part may not be appropriate in another part. That's okay. But we don't have a TOE yet either, and that's a critical part of finding a TOE, is bringing all this together.
- https://physics.allen-zhu.com/part-2-grade-school-math/part-...
But I think is a good example that fits the OP's critique (I don't think the critique fits to the arXiv paper. Even though I expected the main results, see my main comment).
The "unreasonableness" in Karpathy's post[1] is using sequencing to process non-sequential data. But the reason this isn't unreasonable is that we explicitly expect non-sequential processes to be able to be reformulated as sequential ones.
The SVHN (hose numbers) he shows is actually a great example of this. We humans don't process that all at once. Our eyes similarly dart around, even if very fast. Or we might think about how to draw a picture. We don't do everything at once, but we work in sections, building up, and have layers that end up being ordered even though this technically isn't a requirement. I'm actually struggling to think of things that cannot be broken down into sequences. He says as much here
| an important point to realize is that even if your inputs/outputs are fixed vectors, it is still possible to use this powerful formalism to process them in a sequential manner.
So really the question is: what part of this was unreasonable? Or what part was unexpected? Honestly, we should be expecting this as the nature of neural nets is itself sequential, data being processed layer by layer. Hell, every computer program has a trace, which is sequential. I can give tons of examples. So it is quite reasonable that sequential processing should work.[0] https://static.googleusercontent.com/media/research.google.c...
[1] https://karpathy.github.io/2015/05/21/rnn-effectiveness/
> I think you misinterpret what it's about. He's pointing out how remarkable it is that the universe obeys laws like...
I apologize for not being clear. But we are talking about the same thing. > The pre scientific understanding of the world was it was driven my gods and spirits.
Winger's paper was written in 1960. I do not think such claims need have been said. Those arguments were prolific and had been made for centuries. He did not need to convince anyone in the scientific community that the laws of nature were not driven by gods and spirits. By the 1960s the scientific age was already mature and it was well established in the community that the laws of nature are not the domain of gospel. | "Well, now you are pushing your joke too far," said the classmate, "surely the population has nothing to do with the circumference of the circle."
The point is made here. It is surprising that math describes reality. It is surprising that a circle has anything to do with a population.I really did mean it when I said "about something that would sound silly today". We take this for granted now, with 60 years of working under this framework, but this wasn't always so. It seems silly now because much of the math we learn is in science classes and even outside we have a particular focus of teaching math most relating to science, but this is a small portion of a much larger field. Fwiw, I am not saying this as a complete outsider, I have a degree in physics.
It is also worth paying attention to the fact that Wigner helped create Mathematical Physics[0]. "Mathematical Physics" is not a pleonasm.
Don't take it just on my word! The Wiki page says something extremely similar!
| In it, Wigner observes that a theoretical physics's mathematical structure often points the way to further advances in that theory and to empirical predictions. Mathematical theories often have predictive power in describing nature. [1]
| Wigner argues that mathematical concepts have applicability far beyond the context in which they were originally developed[1]
> The mathematical laws were only discovered by scientific investigation.
I should make sure this is clear though (unsure which interpretation you intend). Math and science aren't interchangeable. Physics uses the language of math as its main method for developing theories and logic. But it is also important to stress that it doesn't use the same mathematical language throughout. The frameworks that those working in relativity use are not useful for those that work in quantum mechanics. If the math was uniform, we would not be dedicating so much time to bridge these. Nor is math absolute here, as it is a map, and we still rely heavily on the language of experimental evidence.Yes, he was saying "use maths". Yes, it sounds silly today, but so do a lot of things that needed be said in the past. I see no reason that the (now) obvious claim by Copernicus would make him any less famous.
[0] https://en.wikipedia.org/wiki/Mathematical_physics
[1] https://en.wikipedia.org/wiki/The_Unreasonable_Effectiveness...
https://arxiv.org/abs/2502.18600
Similar level of results in a fraction of the tokens resulting in similar quality for less cost for longer runs.
But also when interacting and needing to read the token responses I can read shorter responses way faster so my own speed is faster.
Does you see your result as putting that paradigm in question, or does the explicit reasoning assessment perhaps ameliorate the issue?
I really encourage you to read that wiki page.
| The quantum theory of the Lamb shift, as conceived by Bethe and established by Schwinger, is a purely mathematical theory and the only direct contribution of experiment was to show the existence of a measurable effect. The agreement with calculation is better than one part in a thousand."
I think you're missing a lot of context in that physics was highly non-mathematical in the past. Physicists called Einstein a mathematician. It isn't too hard to see when he asserted that his theories were correct and didn't need experimental confirmation. | Hamming argues that Albert Einstein's pioneering work on special relativity was largely "scholastic" in its approach. He knew from the outset what the theory should look like (although he only knew this because of the Michelson–Morley experiment), and explored candidate theories with mathematical tools, not actual experiments. Hamming alleges that Einstein was so confident that his relativity theories were correct that the outcomes of observations designed to test them did not much interest him. If the observations were inconsistent with his theories, it would be the observations that were at fault.
Hell, go read Ian Hacking, any metaphysics, or ask ChatGPT. They will confirm what I'm saying. Even some of this is discussed in An Opinionated History of Mathematics[0], though much more focused on math. I'm more mentioning it because it is good and helps provide some of that historical context.It is kinda crazy that a thing we created, without the specific intent of modeling the world, ended up being so great at modeling the world. That's the unreasonable effectiveness.
In fairness, to change my opinion, you would need to show me some chain of reasoning or a conversation Wigner is clearly responding to that involves religion. Because this is what I see, but around math not being physics, and is what drives my interpretation.
[0] https://intellectualmathematics.com/opinionated-history-of-m...