One could argue a common characteristic of the above exams is that they each test memory, and, as such, one could argue that GPT-4's above-average performance is not necessarily evidence of "reasoning". That is, GPT-4 has no "understanding" but it has formidable reading speed and retention (memory).
While preparation for the above exams depends heavily on memorisation, other exams may focus more on reasoning and understanding.
Surely GPT-4 would fail some exams. But when it comes to GPT-4's exam performance, only the positive results are reported.
Some? It does hilariously badly on basic math.
With confidence, though.
How does it do on the GMAT? Sounds like a good candidate for an MBA program.
I don’t think they will make this argument since it will heavily negate their (expected) argument that they’re not open-sourcing the model because of safety concerns.
The default is failing the exams. I'd be no less impressed if they came right out and said "This is a short list of the only exams it passes" simply because (IMO) it's remarkable that a machine could pass any of those exams in the first place. Just a couple years ago, it would have been outlandish for a machine to even have a double digit score (at best!).
If we've already found ourselves in a position where passing grades on some exams that qualify people for their careers is unremarkable, I'll honestly be a bit disappointed. 99th percentile on the GRE Verbal would make an NLP researcher from 2010 have a damn aneurysm; if we're now saying that's "not reasoning" then we're surely moving the goalposts for what that means.
Update GTP-4 turbo is now up to about 770 beating most humans https://twitter.com/airesearchtools/status/17569731696325880...
The Advanced Sommelier exam is part of a process that involves taking other exams and courses. The GRE Verbal is usually taken in combination with other GRE parts and used to apply for entry into a program where years of further study may be required. The UBE normally follows years of study in an approved program. (Back in the day, some people used to take the MBE, which is now a part of the UBE, while they were still in school because the exam was so easy: it was a standardised, multiple-choice test.)
The plaintiffs must make the argument that GPT-4 is "AGI" because the alleged agreement to form OpenAI was focused on "AGI" specifically, not simply development and improvement of LLMs. If OpenAI has not yet achieved AGI, then arguably the alleged agreement does not restrict whatever it is doing now. It only applies to "AGI".
It seems like you have a very low bar for "elite", a very limited definition of "math", and a very peculiar one of "better at".
Yeah, and I just spent $20 to call BS on what you said.
GPT-4 just bullshits with confidence. I don't see how code interpeter can possibly help here.
Behold:
======================
Me: I drew three arrows on a piece of paper starting at the same point, pointing left, up, and diagonally (up and right). Do they represent linearly independent vectors?
---------------------
ChatGPT: In this case, you have described three different arrows pointing in unique directions:
One pointing left, which can be represented as a vector, say, [-1,0] in a two-dimensional space (using standard Cartesian coordinates).
One pointing up, which can be represented as a vector, say, [0, 1].
One pointing diagonally up and to the right, which could be represented as a vector, say, [1,1].
For these vectors to be linearly independent, no vector can be expressed as a linear combination of the others. In other words, there's no way to add or subtract scaled versions of any two vectors to get the third.
Looking at the vectors you've described:
The first vector represents movement purely in the horizontal direction. The second vector represents movement purely in the vertical direction. The third vector represents movement in both directions simultaneously.
Because none of these vectors can be created by scaling and adding the other two (for example, you can't add or subtract any combination of the first and second vectors to get the third one), they are linearly independent.
In mathematical terms, if we form a matrix with these vectors as columns, the determinant of this matrix would not be zero (a non-zero determinant indicates linear independence for the columns of the matrix). This is a more formal way to show their independence, but based on their directions and the explanation above, these vectors are indeed linearly independent.
It's hilarious.
Example
=======
Me: I have a small sack with yellow sand, and a big sack with blue sand of the same density.
I take some amount of sand from the small sack, and put it into the big sack. Then I take the same amount from the big sack, and put it into the small sack. Now both sacks contain some yellow and some blue sand. Is there more blue sand in the small sack or yellow sand in the big sack?
ChatGPT:
Let's denote the amount of sand transferred from one sack to the other and then back again as "X" kg. Here's what happens with each step:
After the first transfer: The small sack (originally all yellow sand) has its content reduced by X kg. The big sack (originally all blue sand) gains X kg of yellow sand but loses no blue sand.
After the second transfer: We take X kg from the big sack. Since this sack contains a mixture of blue and yellow sand, we need to determine the proportion of yellow and blue sand in this mixed X kg.
Before the second transfer, the big sack contains X kg of yellow sand and its original quantity of blue sand. Let's assume the original amount of blue sand doesn't change our calculation because we're considering the proportion of yellow to total sand when we take X kg out.
Therefore, when we take X kg out of the big sack, we're removing a proportion of yellow and blue sand based on their proportions in the sack at that moment. If the sack had a total mass of sand (before removing X) of "Original Blue + X", the proportion of yellow sand is X/(Original Blue + X), and the proportion of blue sand is "Original Blue"/(Original Blue + X).
However, the exact fraction of yellow and blue sands removed from the big sack will equate to the fractions present: i.e., if the sack had more blue than yellow, every scoop (in this case, the same amount X) will contain less yellow sand than blue sand by their ratios.
But here's the crucial understanding:
The amount of blue sand in the small sack is the amount of blue sand that was part of the X kg removed from the big sack during the second transfer.
The amount of yellow sand in the big sack after the second transfer is less than X, because some of the yellow sand (which came from the small sack originally) was taken back in the second transfer (since the mix removed contains both yellow and blue, not just yellow).
The critical mathematical realization comes from the mixture theory: when you remove a mix from a larger quantity containing varied parts, you dilute the concentration in the source. But since you're asking for a direct comparison after only a mutual exchange:
You will have removed less yellow from the big sack than you put in because you're removing a mix, meaning some of the original blue is also taken out. Thus, without needing to plug in specific values:
After the exchange, there will be more blue sand in the small sack than there is yellow sand in the big sack. This is because the portion taken back from the big sack contains both yellow and blue, reducing the amount of original yellow left in the big sack from X. Meanwhile, the small sack would have added to it an amount of blue sand equal to the portion of blue sand in the mixed X kg taken out during the second transfer, which has no prior removal, only an addition.
It has literally zero mathematics involved
GPT still gets lots of logic puzzles wrong.
An example maths problem that it used to get wrong but now gets right is: "what is the square route of (5943/6) times 9"