As for the fact that it gets things wrong sometimes - sure, this doesn't say it actually learned every algorithm (in whichever model you may be thinking about). But the nice thing is that we now have this proof via category theory, and it allows us to both frame and understand what has occurred, and to consider how to align the systems to learn algorithms better.
A system that can will probably adopt a different acronym (and gosh that will be an exciting development... I look forward to the day when we can dispatch trivial proofs to be formalized by a machine learning algorithm so that we can focus on the interesting parts while still having the entire proof formalized).
What's a token?
1. ChatGPT knows the algorithm for adding two numbers of arbitrary magnitude.
2. It often fails to use the algorithm in point 1 and hallucinates the result.
Knowing something doesn't mean it will get it right all the time. Rather, an LLM is almost guaranteed to mess up some of the time due to the probabilistic nature of its sampling. But this alone doesn't prove that it only brute-forced task X.
You're using it wrong. If you asked a human to do the same operation in under 2 seconds without paper, would the human be more accurate?
On the other hand if you ask for a step by step execution, the LLM can solve it.
Tokens exist because transformers don't work on bytes or words. This is because it would be too slow (bytes), the vocabulary too large (words), and some words would appear too rarely or never. The token system allows a small set of symbols to encode any input. On average you can approximate 1 token = 1 word, or 1 token = 4 chars.
So tokens are the data type of input and output, and the unit of measure for billing and context size for LLMs.
Your argument is the equivalent of saying humans can't do math because they rely on calculators.
In the end what matters is whether the problem is solved, not how it is solved.
(assuming that the how has reasonable costs)
ChatGPT needs to do the same process to solve the same problem. It hasn’t memorized the addition table up to 10 digits and neither have you.
Well, duh. We’re trying to build a human like mind, not a calculator.
I just asked ChatGPT to do the calculation both by using a calculator and by using the algorithm step-by-step. In both cases it got the answer wrong, with different results each time.
More concerning, though, is that the answer was visually close to correct (it transposed some digits). This makes it especially hard to rely on because it's essentially lying about the fact it's using an algorithm and actually just predicting the number as a token.
Anyways, criticizing its math abilities is a bit silly considering it’s a language model, not a math model. The fact I can teach it how to do math in plain English is still incredible to me.
I digress. The critique I have for it is much more broad than just its math abilities. It makes loads of mistakes in every single nontrivial thing it does. It’s not reliable for anything. But the real problem is that it doesn’t signal its unreliability the way an unreliable human worker does.
Humans we can’t rely on are don’t show up to work, or come in drunk/stoned, steal stuff, or whatever other obvious bad behaviour. ChatGPT, on the other hand, mimics the model employee who is tireless and punctual. Who always gets work done early and more elaborately than expected. But unfortunately, it also fills the elaborate result with countless errors and outright fabrications, disguised as best as it can like real work.
If a human worker did this we’d call it a highly sophisticated fraud. It’s like the kind of thing Saul Goodman would do to try to destroy the reputation of his brother. It’s not the kind of thing we should celebrate at all.
Have not humans been demonstrated, time and time again, to be always anticipating the next phrase in a passage of music, or the next word in a sentence?