Shame on all of the people involved in this: the people in these companies, the journalists who shovel shit (hope they get replaced real soon), researchers who should know better, and dementia ridden legislators.
So utterly predictable and slimy. All of those who are so gravely concerned about "alignment" in this context, give yourselves a pat on the back for hyping up science fiction stories and enabling regulatory capture.
You're leaving out the essentials. These models do more than fitting the data given. They can output it in a variety of ways, and through their approximation, can synthesize data as well. They can output things that weren't in the original data, tailored to a specific request in the tiniest of fractions of the time it would take a normal person to look up and understand that information.
Your argument is almost like saying "give me your RSA keys, because it's just two prime numbers, and I know how to list them."
Unless you think OpenAI is blatantly lying about this:
"A.1 Sourcing. We sourced either the most recent publicly-available official past exams, or practice exams in published third-party 2022-2023 study material which we purchased. We cross-checked these materials against the model’s training data to determine the extent to which the training data was not contaminated with any exam questions, which we also report in this paper."
"As can be seen in tables 9 and 10, contamination overall has very little effect on the reported results."
They also report results on uncontaminated data which shows basically no statistical difference.