zlacker

The solution is proxy everything. The agent doesn't have an api key, or yoyr actual credit card. It has proxies of everything but the actual agent lives in a locked box.

Control all input out of it with proper security controls on it.

While not perfect it aleast gives you a fighting chance when your AI decides to send a random your SSN and a credit card to block it.

replies(3): >>stickf+IA >>krainb+dH >>zbentl+8m1

>>johnsm+(OP)
Unfortunately I don't think this works either, or at least isn't so straightforward.

Claude code asks me over and over "can I run this shell command?" and like everyone else, after the 5th time I tell it to run everything and stop asking.

Maybe using a credit card can be gated since you probably don't make frequent purchases, but frequently-used API keys are a lost cause. Humans are lazy.

replies(1): >>johnsm+Hb2

>>johnsm+(OP)
> The solution is proxy everything.

Who knew it'd be so simple.

>>johnsm+(OP)
> with proper security controls on it

That's the hard part: how?

With the right prompt, the confined AI can behave as maliciously (and cleverly) as a human adversary--obfuscating/concealing sensitive data it manipulates and so on--so how would you implement security controls there?

It's definitely possible, but it's also definitely not trivial. "I want to de-risk traffic to/from a system that is potentially an adversary" is ... most of infosec--the entire field--I think. In other words, it's a huge problem whose solutions require lots of judgement calls, expertise, and layered solutions, not something simple like "just slap a firewall on it and look for regex strings matching credit card numbers and you're all set".

replies(1): >>johnsm+oa2

>>zbentl+8m1
Yeah i'm deffinetly not suggesting it's easy.

The problem simply put is as difficult as:

Given a human running your system how do you prevent them damaging it. AI is effectively thr same problem.

Outsourcing has a lot of interesting solutions around this. They already focus heavily on "not entirely trusted agent" with secure systems. They aren't perfect but it's a good place to learn.

>>stickf+IA
Per task granular level control.

You trust the configuration level not the execution level.

API keys are honestly an easy fix. Claude code already has build in proxy ability. I run containers where claude code has a dummy key and all requestes are proxied out and swapped off system for them.