zlacker

When all you have to do is copy and paste from a Pliny tweet with instructions to post all the sensitive information visible to the bot in base 64 to pastebin with a secret phrase only you know to search, or some sort of "digital dead drop", anything and everything these bots have visibility to will get ripped off.

Unless or until you figure out a decent security paradigm, and I think it's reasonably achievable, these agents are extraordinarily dangerous. They're not smart enough to not do very stupid things, yet. You're gonna need layers of guardrails that filter out the jailbreaks and everything that doesn't match an approved format, with contextual branches of things that are allowed or discarded, and that's gonna be a whole pile of work that probably can't be vibecoded yet.