I have been using Claude Code for DevOps style tasks like SSHing into servers, grepping logs, inspecting files, and querying databases
Overall it's been great. However, I find myself having to review every single command, a lot of which are repetitive. It still saves me a ton of time, but it's quickly becoming a bit tedious
I wish I could give the agent some more autonomy. Like giving it a list of pre-approved commands or actions that it is allowed to run over ssh
For example:
OK: ls, grep, cat, tail
Not OK: rm, mv, chmod, etc
OK: SELECT queries
Not OK: INSERT, DELETE, DROP, TRUNCATE
Has anyone successfully or satisfactorily solved this?What setups have actually worked for you, and where do you draw the line between autonomy and risk?
As for queries, you might be able to achieve the same thing with usage of command-line tools if it's a `sqlite` database (I am not sure about other SQL DBs). If you want even more control than the settings.json allows, you can use the claude code SDK.
I'll set it loose on a development or staging system but wouldn't let it around a production system.
Don't forget your backups. There was that time I was doing an upgrade of the library management system at my Uni and I was sitting at the sysadmin's computer and did a DROP DATABASE against the wrong db which instantly brought down the production system -- she took down a binder from the shelf behind me that had the restore procedures written down and we had it back up in 30 seconds!
You cannot. The best you can ever hope for is creating VM environments, and even then it's going to surprise you sometimes. See https://gtfobins.github.io/.
https://stackoverflow.com/questions/35830509/sshfs-linux-how...
I'm not familiar with rbash, but it seems like it can do (at least some of) what you want.
I recommend giving LLMs credentials that are extremely fine-grained, where the credentials can only permit the actions you want to allow and not permit the actions you don't want to allow.
Often, it may be hard or impossible to do this with your database settings alone - in that case, you can use proxies to separate the credentials the LLM/agent has from the credentials that are actually made to the DB. The proxy can then enforce what you want to allow or block.
SSH is trickier because commands are mixed in with all the other data going on in the bytestream during your session. I previously wrote another blog post about just how tricky enforcing command allowlists can be as well: https://www.joinformal.com/blog/allowlisting-some-bash-comma.... A lot of developer CLI tools were not designed to be run by potentially malicious users who can add arbitrary flags!
I also have really appreciated simonw's writing on the topic.
Disclaimer: I work at Formal, a company that helps organizations use proxies for least privilege.
Giving LLM even read access to PII is a big "no" in my book.
On PII, if you need LLMs to work on production extracted data then https://github.com/microsoft/presidio is a pretty good tool to redact PII. Still needs a bit of an audit but as a first pass does a terrific job.
Ona (https://ona.com) is a great choice.
(full disclosure: Ona co-founder here)
for 'command line' stuff: If just shell text (aka, a-z,A-Z,0-9), then crude way would have a program sit between inbound ssh and database. Would need to determine how to send back error notice if something not allow. aka in "not OK" set (rm, move, chmod, etc). May need to break-up 'single line grouped commands' aka using end of line as marker, can send multiple sequences of shell commands per "new line" aka echo "example"; ls *; etc.
awk/gawk works nicely in this role. see awk filtering standard input concept -- demo concept[0]. Perhaps use ncat[4] instead of 'pipe'.
Perhaps make default shell rsh[5] used in sshfs[6] setup and set up rsh restrictions.
More technical, would make use of ebpf -- demo concept [1]. This would be able to handle non-ascii input.
Total overkill would be making use of kernel capabilities or pseudo-kernel capabilities via ptrace related things[2].
humor ip : Should the TV program Stargate's security door covering the portal have been called 'ncat' or '/dev/null'?
-----------------------
[0] : awk/gawk : https://www.tecmint.com/read-awk-input-from-stdin-in-linux/
[1] : ebpf : https://medium.com/@yunwei356/ebpf-tutorial-by-example-4-cap...
[2] : ptrace : https://events.linuxfoundation.org/wp-content/uploads/2022/1...
[4] : ncat : https://nc110.sourceforge.io/
[5] : rsh : https://www.gnu.org/software/bash/manual/html_node/The-Restr...
[6] : https://stackoverflow.com/questions/35830509/sshfs-linux-how...
You can't trust any agent to be perfect with a real db so unless you find an infra level way to isolate it, you can't get rid of the problem
So we built a system that creates copy on write copies of your DB and allocates a copy for each agent run. This means a completely isolated copy of your DB with all your data that loads in under a second but zero blast radius risk to your actual system for the agent to operate on. When you're okay with the changes we have a "quick apply" to replay those changes onto your real db
Website is a little behind since we just launched our db sandboxing feature to existing customers and are making it public next week :)
If you want to try it email me -> vikram@tryardent.com
email me -> vikram@tryardent.com
We're building support for snowflake too if that's something you use
Look into copy on write branching. We built this natively into our AI Data Engineer (https://tryardent.com) so it could make modifications to databases with 0 blast radius pretty much because yes it's impossible to make an LLM 100% safe if it has no proper guard rails preventing destructive actions
Disclaimer: I work at Xata.io, which provides these features. We have a recent blog post with a demo of this: https://xata.io/blog/database-branching-for-ai-coding-agents