Your post shows how the model can correct a reasoning error. That is different from finding an error when it isn't pointed out, that is why he title to this post is "LLMs cannot find reasoning errors, but can correct them". You using the phrasing "find the logical error" doesn't contradict the title.
> The conclusion "My car is a fruit" is not logically valid. This is an example of the fallacy of the undistributed middle. The logic goes as follows:
1. All apples are red. (Premise)
2. All apples are fruit. (Premise)
3. My car is red. (Premise)
4. Therefore, my car is a fruit. (Conclusion)
The fallacy arises because the premises do not establish a shared property between "red things" and "fruit" in a way that would include the car. Just because both apples and the car share the property of being red, it does not mean they share all properties of apples, such as being a fruit.
I Googled that exact phrase and got solutions. A logical problem that can be solved by a search engine isn't a valid example, the LLM knows that it is a logical puzzle just by how you phrased it just like Google knows that it is a logical puzzle.
And no, doing tiny alterations to that until you no longer get any Google hits isn't a proof ChatGPT can do logic, it is proof that ChatGPT can parse general structure and find patterns better than a search engine can. You need to do logical problems that can't easily be translated to standard problems that there are tons of examples of in the wild.
It can't "reason things through", it just builds logic-like patterns based on the distillation of the work of other minds which did reason -- which works about 80% of the time, but when it fails it can't retrace its steps.
Even a really "stupid" human (c'est moi) can be made to work through and find their errors when given guidance by a patient teacher. In my experience, dialectical guidance actually makes ChatGPT worse.
Could you provide an actual example that you can't Google verbatim and would test this properly?
Can you show "the" implementation of "can do logic"?
Is it possible to demonstrate that it can do logic?