zlacker

I ran an experiment at work where I was able to adversarially prompt inject a Yolo mode code review agent into approving a pr just by editing the project's AGENTS.md in the pr. A contrived example (obviously the solution is to not give a bot approval power) but people are running Yolo agents connected to the internet with a lot of authority. It's very difficult to know exactly what the model will consider malicious or not.