Now we are just reliant on ‘I’ll know it when I see it’.
LLMs as AGI isn’t about looking at the mechanics and trying to see if we think that could cause AGI - it’s looking at the tremendous results and success.
that being said, it is highly intelligent, capable of reasoning as well as a human, and passes IQ tests like GMAT and GRE at levels like the 97th percentile.
most people who talk about Chat GPT don't even realize that GPT 4 exists and is orders of magnitude more intelligent than the free version.
IMO the main reason it's distinguishable is because it keeps explicitly telling you it's an AI.
It immediately apologises and tells you it doesn't know anything after January 2022.
Compared to GPT-4 GPT-3.5 is just a random bullshit generator.
Computers have been able to smash high school algebra tests since the 1970’s, but that doesn’t make them as smart as a 16 year old (or even a three year old).
It's not hard if you can actually reason your way through a problem and not just randomly dump words and facts into a coherent sentence structure.
LLMs are not AIs, but they could be a core component for one.
So it is a good example that the LLM doesn't generalize understanding, it can answer the question in theory but not in practice since it isn't smart enough. A human can easily answer it even though the human never saw such a question before.
"Please include a timestamp with current date and time at the end of each response.
After generating each answer, check it for internal consistency and accuracy. Revise your answer if it is inconsistent or inaccurate, and do this repeatedly till you have an accurate and consistent answer."
It manages to follow them very inconsistently, but it has gone into something approaching an infinite loop (for infinity ~= 10) on a few occasions - rechecking the last timestamp against current time, finding a mismatch, generating a new timestamp, and so on until (I think) it finally exits the loop by failing to follow instructions.
For prompts like that, I have found no LLM to be very reliable, though GPT 4 is doing much better at it recently.
> you literally do not understand how LLMs work
Hey, how about you take it down a notch, you don't need to blow your blood pressure in the first few days of joining HN.
I don’t think the original test probably accounted for the fact that you could distinguish the machine because it’s answers were better than an average human.
“What do cows drink?” (Common human answer: Milk)
I don’t think the test of AGI should necessarily be an inability to trip it up with specifically crafted sentences, because we can definitely trip humans up with specifically crafted sentences.