zlacker

[parent] [thread] 0 comments
1. estear+(OP)[view] [source] 2026-01-31 21:09:37
One nice thing about humans for contexts like this is that they make a lot of random errors, as opposed to LLMs and other automated systems having systemic (and therefore discoverable + exploitable) flaws.

How many caught attempts will it take for someone to find the right prompt injection to systematically evade LLMs here?

With a random selection of sub-competent human reviewers, the answer is approximately infinity.

[go to top]