zlacker

Can they reason, or is the volume of training data sufficient for them to match relationships up to appropriate expressions?

Basically, if humans have had meaningful discussions about it, the product of their reasoning is there for the LLM, right?

Seems to me, the "how many R's are there in the word "strawberry" problem is very suggestive of the idea LLM systems cannot reason. If they could, the question is not difficult.

The fact is humans may never have actually discussed that topic in any meaningful way captured in the training data.

And because of that and how specific the question is, the LLM has no clear relationships to map into a response. It just does best case, whatever the math deemed best.

Seems plausible enough to support the opinion LLM'S cannot reason.

What we do know is LLMs can work with anything expressed in terms of relationships between words.

There is a ton of reasoning templates contained in that data.

Put another way:

Maybe LLM systems are poor at deduction, save for examples contained in the data. But there are a ton of examples!

So this is hard to notice.

Maybe LLM systems are fantastic at inference! And so those many examples get mapped to the prompt at hand very well.

And we do notice that and see it like real thinking, not just some horribly complex surface containing a bazillion relationships...

replies(1): >>chongl+hi

>>ddingu+(OP)
The “how many R’s are in the word strawberry?” problem can’t be solved by LLMs specifically because they do not have access to the text directly. Before the model sees the user input it’s been tokenized by a preprocessing step. So instead of the string “strawberry”, the model just sees an integer token the word has been mapped to.

replies(2): >>ddingu+Cy1 >>fennec+Ka5

>>chongl+hi
I think my point stands, despite a poor example.[0]

Other examples exist.

[0]That example is due to tokenization. DoH! I knew better too.

Ah well.

>>chongl+hi
I've tried it out and found that some models can answer the question if it's phrased right. And pretty much all models get it right if you also spell it out letter by letter to solve the problem you pointed out.

replies(1): >>ddingu+6M6

>>fennec+Ka5
Spelling it out works in my experience. So does asking it for a python program to solve the problem.

replies(1): >>fennec+7C9

>>ddingu+6M6
Yeah, it does teach me more about how LLMs work on the inside when it can't answer a plain English logic question like that, but I can provide it a code example and it can execute it step by step and get a correct answer; it's clearly been trained on enough JS that even a complex reduce + arrow function I watched kunoichi (am RP model nonetheless!) imaginary execute it step by step and arrive at a correct answer.

I think it's something like the counting parts of problems that current models are shaky with, and I imagine it's a training data problem.