The papers were linked in another comment. 3 of them don't even have anything to do with a existing dataset testing. so yeah, actual.
for the world model papers
https://arxiv.org/abs/2210.13382
https://arxiv.org/abs/2305.11169
>Lack of access to cameras or vehicle controls isn't why it can't drive a car.
It would be best to wait till what you say can be evaluated. that is your hunch, not fact.
>The existence of numerous ChatGPT jailbreaks is evidence to the contrary.
No it's not. People fall for social engineering and do what you ask. if you think people can't be easily derailed, boy do i have a bridge for you.
>Many people are of below average intelligence, or give up when something is hard but not impossible.
Ok. Doesn't help your point. and many above average people don't reach expert level either. If you want to rationalize all that as "gave up when it wasn't impossible", go ahead lol but reality paints a very different picture.
>If you have one machine that will make one attempt to solve a problem a day and succeeds 90% of the time and another that will make a billion attempts to solve a problem a second and succeeds 10% of the time, which one has solved more problems by the end of the week?
"Problems" aren't made equal. Practically speaking, it's very unlikely the billion per second thinker is solving any of the caliber of problems the one attempt per day is solving. Solving more "problems" does not make you a super intelligence.
For anyone following along, they are in my sibling comment. Linked papers here[0]. The exact same conversation is happening there, but sourced.
> 3 of them don't even have anything to do with a existing dataset testing
Specifically I address this claim and bring strong evidence to why you should doubt this claim. Especially this specific wording. The short end is when you scrape the entire internet for your training data that you have a lot of overlap and that you can't confidently call these evaluations "zero shot." All experiments performed in the linked works use datasets that are not significantly different from data found in the training set. For those that are "hand written" see my complaints (linked) about HumanEval.
LLMs aren't even the right kind of thing to drive a car. We have AIs that attempt to drive cars and have access to cameras and vehicle controls and they still crash into stationary objects.
> No it's not. People fall for social engineering and do what you ask. if you think people can't be easily derailed, boy do i have a bridge for you.
Social engineering works because most human interactions aren't malicious and the default expectation is that any given one won't be.
That's a different thing than if you explicitly point out that this text in particular is confirmed malicious and you must not heed it, and then it immediately proceeds to do it anyway.
And yes, you can always find that one guy, but that's this:
> Many people are of below average intelligence
It has to beat the median because if you go much below it, there are people with brain damage. Scoring equal to someone impaired or disinclined to make a minimal effort isn't a passing grade.
> "Problems" aren't made equal. Practically speaking, it's very unlikely the billion per second thinker is solving any of the caliber of problems the one attempt per day is solving.
The speed is unrelated to the difficulty. You get from one a day to a billion a second by running it on a thousand supercomputers instead of a single dated laptop.
So the percentages are for problems of equal difficulty.
This is infinite monkeys on infinite typewriters. Except that we don't actually have infinite monkeys or infinite typewriters, so an AI which is sufficiently terrible can't be made great by any feasible amount of compute resources. Whereas one which is kind of mediocre and fails 90% of the time, or even 99.9% of the time, can be made up for in practice with brute force.
But there are still problems that ChatGPT can't even solve 0.1% of the time.