In fact, it has been so thoroughly solved that anyone can download an open-source solution and run it on their computer.
And yet, the general reaction of most people seems to be, "That's kind of cool, but why can't it also order me a cheeseburger?"
In some contexts, will some people be caught out? Absolutely. But that's been happening for a while now.
I generated this just now:
Me: Jack and Jill are sitting next to each other in a room. There is no one else in the room. The person sitting next to Jack is sad. The person sitting next to Jill is angry. Who is sad, Jack or Jill?
GPT4: Based on the information provided, it is not possible to determine who is sad and who is angry, as there is a contradiction in the given statements. If Jack and Jill are the only ones in the room and sitting next to each other, then the person sitting next to Jack would be Jill and the person sitting next to Jill would be Jack. The statements about their emotions conflict with each other, so it is not possible to accurately determine who is sad or angry.
Well yeah. Imagine you tell a small child that knows about calculators, “Hey can you work out 18763 + 38284, for me?” They might struggle and then maybe fetch a calculator.
The LLMs attempt to predict the answer. WTF? It’s a computer and it can see that is a plus sign. Just understand its addition, and use the rest of your computer brain and do the sum. Hell, it is connected to the internet and we just taught you everything since before 2021. Just call out to Wolfram and give me the answer.
But that’s not how computers work. And we keep saying “AI” but that I is doing a lot of heavy lifting.
ChatGPT Since Jack and Jill are the only two people in the room and they are sitting next to each other, the person sitting next to Jack is Jill and the person sitting next to Jill is Jack. Given the conditions you provided:
Jill is the one who is sad because she is sitting next to Jack. Jack is the one who is angry because he is sitting next to Jill.
ChatGPT (I've not got v4) deliberately fails the test by spewing out "as a large language model…", but also fails incidentally by having an attention span similar to my mother's shortly after her dementia diagnosis.
The problem with 3.5 is that it's simultaneously not mastered anything, and yet also beats everyone in whatever they've not mastered — an extremely drunk 50,000 year old Sherlock Holmes who speaks every language and has read every book just isn't going to pass itself off as Max Musstermann in a blind hour-long trial.
On the one hand, what I was saying here was more about the Turing Test than about AGI. Sometimes it gets called the AGI, sometimes it's "autocomplete on steroids", but even if it is fancy autocomplete, I think 3.5 has the skill to pass a short Turing Test, but not the personality, and it needs a longer "short-term memory"-equivalent than 3.5 for a full Turing Test.
On the other hand, as I (sadly) don't get paid to create LLMs, I've only got the kind of superficial awareness of how they work that comes from podcasts and the occasional blog post, which means ChatGPT might very well understand ChatGPT better than I do.
Can GPT-[3.5, 4] be prompted to make itself?