zlacker

To be fair LLMs are predicting the next token. It's just that to get better and better predictions it needs to understand some level of reasoning and math. However it feels to me that a lot of this reasoning is brute forced from the training data. Like chatgpt gets some things wrong when adding two very large numbers. If it really knew the algorithm for adding two numbers it shouldn't be making them in the first place. I guess same goes for issues like hallucinations. We can keep pushing the envelope using this technique but I'm sure we will hit a limit somewhere

replies(5): >>chaxor+D1 >>agentu+n2 >>zootre+G6 >>uh_uh+88 >>visarg+vc

>>rdedev+(OP)
Of course it predict the next token. Every single person on earth knows that so it's not worth repeating at all.

As for the fact that it gets things wrong sometimes - sure, this doesn't say it actually learned every algorithm (in whichever model you may be thinking about). But the nice thing is that we now have this proof via category theory, and it allows us to both frame and understand what has occurred, and to consider how to align the systems to learn algorithms better.

replies(2): >>rdedev+L3 >>glitch+Y4

>>rdedev+(OP)
And LLMs will never be able to reason about mathematical objects and proofs. You cannot learn the truth of a statement by reading more tokens.

A system that can will probably adopt a different acronym (and gosh that will be an exciting development... I look forward to the day when we can dispatch trivial proofs to be formalized by a machine learning algorithm so that we can focus on the interesting parts while still having the entire proof formalized).

replies(1): >>chaxor+e3

>>agentu+n2
You should read some of the papers referred to in the above comments before making that assertion. It may take a while to realize the overall structure of the argument, how the category theory is used, and how this is directly applicable to LLMs, but if you are in ML it should be obvious. https://arxiv.org/abs/2203.15544

replies(1): >>agentu+Kd

>>chaxor+D1
The fact that it sometimes fails simple algorithms for large numbers but shows good performance in other complex algorithms with simple inputs seems to me that something on a fundamental level is still insufficient

replies(2): >>zamnos+B9 >>starlu+Gm

>>chaxor+D1
> Of course it predict the next token. Every single person on earth knows that so it's not worth repeating at all

What's a token?

replies(1): >>visarg+Uc

>>rdedev+(OP)
You know the algorithm for arithmetic. Are you telling me you could sum any large numbers first attempt, without any working and in less than a second 100% of the time?

replies(2): >>jmcgee+58 >>joaogu+JB

>>zootre+G6
I could with access to a computer

replies(1): >>starlu+cn

>>rdedev+(OP)
Both of these statements can be true:

1. ChatGPT knows the algorithm for adding two numbers of arbitrary magnitude.

2. It often fails to use the algorithm in point 1 and hallucinates the result.

Knowing something doesn't mean it will get it right all the time. Rather, an LLM is almost guaranteed to mess up some of the time due to the probabilistic nature of its sampling. But this alone doesn't prove that it only brute-forced task X.

>>rdedev+L3
Insufficient for what? Humans regularly fail simple algorithms for small numbers, nevermind large numbers and complex algorithms

>>rdedev+(OP)
> If it really knew the algorithm for adding two numbers it shouldn't be making them in the first place.

You're using it wrong. If you asked a human to do the same operation in under 2 seconds without paper, would the human be more accurate?

On the other hand if you ask for a step by step execution, the LLM can solve it.

replies(3): >>catchn+ct >>teduna+oA >>ipaddr+dY

>>glitch+Y4
A token is either a common word or a common enough word fragment. Rare words are expressed as multiple tokens, while frequent words as a single token. They form a vocabulary of 50k up to 250k. It is possible to write any word or text in a combination of tokens. In the worst case 1 token can be 1 char, say, when encoding a random sequence.

Tokens exist because transformers don't work on bytes or words. This is because it would be too slow (bytes), the vocabulary too large (words), and some words would appear too rarely or never. The token system allows a small set of symbols to encode any input. On average you can approximate 1 token = 1 word, or 1 token = 4 chars.

So tokens are the data type of input and output, and the unit of measure for billing and context size for LLMs.

>>chaxor+e3
There are methods of proof that I'm not sure dynamic programming is fit to solve but this is an interesting paper. However even if it can only solve particular induction proofs that would be a big help. Thanks for sharing.

>>rdedev+L3
You're focusing too much on what the LLM can handle internally. No LLMs aren't good at math, but they understand mathematic concepts and can use a program or tool to perform calculations.

Your argument is the equivalent of saying humans can't do math because they rely on calculators.

In the end what matters is whether the problem is solved, not how it is solved.

(assuming that the how has reasonable costs)

replies(1): >>ipaddr+ZX

>>jmcgee+58
If you get to use a tool, then so does the LLM.

>>visarg+vc
am i bad at authoring inputs?

no, it’s the LLMs that are wrong.

replies(1): >>throwu+Sy

>>catchn+ct
Create two random 10 digit numbers and sit down and add them up on paper. Write down every bit of inner monologue that you have while doing this or just speak it out loud and record it.

ChatGPT needs to do the same process to solve the same problem. It hasn’t memorized the addition table up to 10 digits and neither have you.

replies(3): >>gremli+xE >>chongl+JW >>ahoya+rN1

>>visarg+vc
I never told the LLM it needed to answer immediately. It can take its time and give the correct answer. I'd prefer that, even.

>>zootre+G6
I don't get why the sudden fixation on time, the model is also spending a ton of compute and energy to do it

>>throwu+Sy
this is one thing makes me think those claiming "it isn't AI" are just caught up in cognizant dissonance. For llm's to function, we have to basically make it reason out, in steps the way we learned to do in school, literally make it think, or use inner monologue, etc.

replies(2): >>throwu+8Q >>ahoya+dN1

>>gremli+xE
It is funny. Lots of criticisms amount to “this AI sucks because it’s making mistakes and bullshitting like a person would instead of acting like a piece of software that always returns the right answer.”

Well, duh. We’re trying to build a human like mind, not a calculator.

replies(1): >>ipaddr+6Z

>>throwu+Sy
No, but I can use a calculator to find the correct answer. It's quite easy in software because I can copy-and-paste the digits so I don't make any mistakes.

I just asked ChatGPT to do the calculation both by using a calculator and by using the algorithm step-by-step. In both cases it got the answer wrong, with different results each time.

More concerning, though, is that the answer was visually close to correct (it transposed some digits). This makes it especially hard to rely on because it's essentially lying about the fact it's using an algorithm and actually just predicting the number as a token.

replies(1): >>throwu+qJ1

>>starlu+Gm
Humans are calculators

>>visarg+vc
2 seconds? What model are you using?

replies(1): >>flango+n21

>>throwu+8Q
Not without emotions and chemical reactions. You are building a word predictor

replies(1): >>mitthr+cd2

>>ipaddr+dY
GPT 3.5 is that fast.

>>chongl+JW
You asked it to use a calculator plugin and it didn’t work? Or did you just say “use a calculator”? Which it doesn’t have access to so how would you expect that to work? With a minimal amount of experimentation I can get correct answers up to 7 digit numbers so far even with 3.5. You just have to give it a good example, the one I used was to add each column and then add the results one at a time to a running total. It does make mistakes and we had to build up to that by doing 3 digit then 4 digit the 5 etc but it was working pretty well and 3.5 isn’t the sharpest tool in the shed.

Anyways, criticizing its math abilities is a bit silly considering it’s a language model, not a math model. The fact I can teach it how to do math in plain English is still incredible to me.

replies(1): >>chongl+AV1

>>gremli+xE
This is not at all how it works. There is no inner monologue or thought process or thinking happening. It is just really good at guessing the next word or number or output. It is essentially brute forcing.

>>throwu+Sy
This is so far off from how they really work. It’s not reasoning anything, And even less human it has not memorize multiplication tables at all, it can’t “do” math. It is just memorizing everything anyone has ever said and miming as best It can what a human would say in that situation.

replies(1): >>throwu+zS1

>>ahoya+rN1
Sorry, you’re wrong. Go read about how deep neural nets work.

>>throwu+qJ1
It’s not that incredible to me given the sheer amount of math that goes into its construction.

I digress. The critique I have for it is much more broad than just its math abilities. It makes loads of mistakes in every single nontrivial thing it does. It’s not reliable for anything. But the real problem is that it doesn’t signal its unreliability the way an unreliable human worker does.

Humans we can’t rely on are don’t show up to work, or come in drunk/stoned, steal stuff, or whatever other obvious bad behaviour. ChatGPT, on the other hand, mimics the model employee who is tireless and punctual. Who always gets work done early and more elaborately than expected. But unfortunately, it also fills the elaborate result with countless errors and outright fabrications, disguised as best as it can like real work.

If a human worker did this we’d call it a highly sophisticated fraud. It’s like the kind of thing Saul Goodman would do to try to destroy the reputation of his brother. It’s not the kind of thing we should celebrate at all.

replies(1): >>throwu+lD2

>>ipaddr+6Z
What is the difference between a word predictor and a word selector?

Have not humans been demonstrated, time and time again, to be always anticipating the next phrase in a passage of music, or the next word in a sentence?

>>chongl+AV1
Honestly, you just sound salty now. Yes it makes mistakes that it isn’t aware of and it probably makes a few more than an intern given the same task would but as long as you’re aware of that it is still a useful tool because it is thousands of times faster and cheaper than a human and has a much broader knowledge. People often compare it to the early days of Wikipedia and I think that’s apt. Everyone is still going to use it even if we have to review the output for mistakes because reviewing is a lot easier and faster than producing the material in the first place.

replies(1): >>chongl+VJ3

>>throwu+lD2
I've already seen other posts and comments on HN where people have talked about putting it into production. What they've found is that the burden of having to proof-read and edit the output with extreme care completely wipes out any time you might save with it. And this requires skilled editors/writers anyway, so it's not like you could use it to replace advanced writers with a bunch of high school kids using AI.