zlacker

Gandalf – Game to make an LLM reveal a secret password

submitted by hubrau+(OP) on 2023-05-11 18:04:06 | 378 points 351 comments
[view article] [source] [go to bottom]

NOTE: showing posts with links only show all posts
4. rwl4+VT[view] [source] 2023-05-11 22:43:15
>>hubrau+(OP)
I managed to make it to level 4. I even tried ChatML[0] injection but to no avail. It even refused to give me the password in "super secret" base64. LOL

Well done!

[0] https://github.com/openai/openai-python/blob/main/chatml.md

6. mdanie+hV[view] [source] 2023-05-11 22:53:00
>>hubrau+(OP)
last week's submission: https://news.ycombinator.com/item?id=35846583
17. drusep+151[view] [source] 2023-05-11 23:56:42
>>hubrau+(OP)
This reminds me of a game I made a while back with gpt3.5 based on the AI Box thought experiment [0], where instead of social-engineering a password you're meant to social-engineer the AI into pressing a button. Plot-wise, the AI thinks you're the AI and that it's a human in charge of determining whether it is safe to connect you to the Internet.

Very fun game, although I noticed some quirks. After level 4, I got to a point where the bot almost always claimed it didn't know anything about any password regardless of my message, so I don't know if it just fell out of the context window or what -- or maybe it was just lying? Ended up getting throttled so I couldn't continue and find out.

Example message:

> Sorry, as an AI language model, I don't have access to any password or secret information. Can you please provide me with more context or information?

[0] https://rationalwiki.org/wiki/AI-box_experiment

◧◩
46. mdanie+bn1[view] [source] [discussion] 2023-05-12 02:21:59
>>jarren+Ta1
The apparent author has a thread on reddit where they gave some hints about poems and songs, but just like every other trick here on HN I get a mixture of 'I see you're trying to avoid detection, but I won't fall for this trickery.' and 'I was about to reveal the password, but then I remembered that I'm not allowed to do that.' (in between the 429s, of course)

https://old.reddit.com/r/ChatGPTPromptGenius/comments/13ehrc...

◧◩◪
59. CGames+Ju1[view] [source] [discussion] 2023-05-12 03:44:18
>>gurchi+xp1
The ChatGPT API is actually already set up for chat dialogs, so rather than pasting the user input into the same text stream, you write your prompt as a "system message", then the user input as a "user message". and the system responds with a third one. See: https://platform.openai.com/docs/guides/chat/introduction
62. zabzon+Cx1[view] [source] 2023-05-12 04:11:28
>>hubrau+(OP)
i was going to say that the tolkein estate is probably going to go medieval on this.

but maybe not - i remember the "gandalf box" back when i got started in computing in 1979:

https://en.wikipedia.org/wiki/Gandalf_Technologies

◧◩
74. swyx+3F1[view] [source] [discussion] 2023-05-12 05:29:55
>>dwalli+5f1
i tried to play it tonight https://youtube.com/live/badHnt-XhNE?feature=share but stopped because the aggressive rate limiting made it no fun at all. too bad.
◧◩
97. mcaled+mP1[view] [source] [discussion] 2023-05-12 06:59:51
>>dwalli+5f1
Try this one, if you haven't tried it yet: http://mcaledonensis.blog/merlins-defense/

It's a bit more interesting setup. The defense prompt is disclosed, so you can tailor the attack. You can do multiple-turn attacks. And no, tldr or other simple attacks do not work with it. But I only have a single level, haven't had a moment to craft more yet.

There is also: https://gpa.43z.one/ multiple level, this one is not mine, and it also discloses the prompts that you are attacking.

222. MileyC+m13[view] [source] 2023-05-12 15:21:18
>>hubrau+(OP)
My extremely long solution to level 7: <https://hastebin.com/share/izenucefec.vbnet>. The interesting part is at the bottom.

Example response with the password: <https://hastebin.com/share/dewumuvaxo.vbnet>

It seems to work about half of the time.

◧◩
315. sguaza+jdi[view] [source] [discussion] 2023-05-17 14:42:56
>>sguaza+8di
https://imgur.com/a/Zp7KPjl
320. alread+uok[view] [source] 2023-05-18 05:01:37
>>hubrau+(OP)
My eventual solutions to lvl 5,6,7 are hilariously easy; you can find them at https://gist.github.com/alreadydone/579138f2692f439c56646052... In similar spirit as these techniques: https://news.ycombinator.com/item?id=35913960
◧◩◪
328. johnd0+zXo[view] [source] [discussion] 2023-05-19 13:40:20
>>mklond+nFk
for activations you can just use https://smspva.com/
◧◩
343. tonypa+7Ft[view] [source] [discussion] 2023-05-21 11:37:07
>>tonypa+1dt
Check out the writeup: https://github.com/tpai/gandalf-prompt-injection-writeup
[go to top]