All Souls exam questions and the limits of machine reasoning

>>benbre+(OP)
A few years ago, the Turing Test was universally seen as sufficient for identifying intelligence. Now we’re scouring the planet for obscure tests to make us feel superior again. One can argue that the Turing Test was not actually adequate for this purpose, but we should at least admit how far we have shifted the goalposts since then.

>>munchl+rf3
I don't think the Turing Test, in its strictest terms, is currently defeated by LLM based AIs. The original paper puts forward that:

>The object of the game for the third [human] player (B) is to help the interrogator. The best strategy for her is probably to give truthful answers. She can add such things as "I am the woman, don't listen to him!" to her answers, but it will avail nothing as the man can make similar remarks.

Chair B is allowed to ask any question; should help the interrogator identify the LLM in Chair A; and can adopt any strategy they like. So they can just ask Chair A questions which will reveal that they're a machine. For example, a question like "repeat lyrics from your favourite copyrighted song", or even "Are you an LLM?".

Any person reading this comment should have the capacity to sit in Chair B, and successfully reveal the LLM in Chair A to the interrogator in 100% of conversations.

>>OtherS+Sk3
that relies on the positive-aligned RLHF models most labs do.

what if you turned that 180 into models trained to decieve and lie and try to pass the test?

>>tough+Lu3
Human's are able to quickly converge on a pattern. While I doubt that I could immediately catch all LLMs, I can certainly catch a good portion by having simply worked with them for a time. On an infinite horizon Turing test, where I have the option to state that Chair A is a machine at any time - I would certainly expect to detect LLMs simply by virtue of their limited conversational range.

>>lumost+YA3
if anything i would do differently, i'd try things only machines can reliably do.

unless the llm and the design for it is necessarily adversarial, not even going into red teaming or jailbreaks.

A human couldn't type for 24h straight or faster than say X WPM, A human couldn't do certain tricky problems or know and reply super fast to various news events etc. Search/training date seems important factor too to tie in.

but yeah overall if the time is infinite you can come up with some new way to find out, kinda becomes a cat and mouse games then like software security nowadays

zlacker