>...I’m not entirely sure what that looks like yet, but things like this are a step in that direction.
This made me stop and think for a moment as to what this would look like as well. I'm having trouble finding it, but I think there was a post by Joe Armstrong (of Erlang) that talked about globally (as in across system boundaries, not global as in global variable) addressable functions?
1. Developer environment sandboxes. This is a cheap and convenient way to run Claude Code / Codex CLI / etc in YOLO mode in a persistent sandboxed VM with a restricted blast radius if something goes wrong.
2. Sandbox API. Fly now have a product that lets me make a simple JSON API call to run untrusted code in a new sandbox. There's even snapshotting support so I can roll back to a known state after running that code.
I wrote more a bunch more about this here: https://simonwillison.net/2026/Jan/9/sprites-dev/
So let's say sprite is my building/dev ground floor. I get my thing/app to where I want it, but at the end of the day I think my thing/app is so awesome that it should be a production app for the whole world, and, I want to actually deploy it on fly, say.
Have you guys thought about that workflow, and what it might take to push button/migrate a sprite app over to fly?
Also, any plans for GPU sprites?
Would I think of this as an EC2 instance which automatically and quickly scales to zero, with pricing only for resources consumed? (CPU and RAM when up, and disk all the time?)
It's a fast starting and fast pausing persistent VM, with a ton of built in developer tools (including a preconfigured Claude Code) and an extra JSON API for executing commands within it so you can treat it as a sandbox.
You may find my writeup here useful: https://simonwillison.net/2026/Jan/9/sprites-dev/
I really hate that modern development means not having persistent disk. I’m glad there are new options coming out which let you do this in and easier way than managing my own EC2 instances!
This is needed for sandboxes if you don't want to throw them away and start over when something goes wrong.
With sprites.dev you can create an additional checkpoint and then turn Claude Code (or your preferred agent) loose to do anything. Even if it burns down the sandbox you can just restore a checkpoint in about a second.
I don't think we're going to do anything new with GPUs any time soon.
I've been thinking a lot about how to run agents (and skills) securely while giving them a lot of powerful capabilities.
I recently used their macaroons library to turn arbitrary API keys (e.g. for stripe's API) into macaroons. I route requests for an upstream host (like stripe) through Envoy as a mitm proxy which injects the real creds after verifying the macaroon.
It is such a powerful pattern. I'm always worried about leaking sensitive keys through prompt injection attacks (or just sending them to anthropic), but in this model you can attenuate the keys (both capabilities & validity window) client side. The Envoy proxy lives inside my flycast network so it can't be accessed externally.
It would be so cool if fly built something like this into sprites.dev (though I can see how it would be spooky to have fly install their own certs for stripe, etc...)
Tokenizer is an explicit proxy though right?
My use case is very similar, but I wanted a transparent proxy so I could run unmodified scripts. It is a tricky design decision though.
I also mount a little fuse filesystem that mints macaroon on read (with a shorter lifetime, probably inspired by y'all but i forget from where).
I work on realtime collaboration of markdown files (currently in Obsidian), which has become a shared-context substrate for agents, skills, etc.. Our own company workspace has skills that have scoped access to fly, stripe, gmail, etc. We're definitely drinking the file-over-app personal-software-for-teams Kool-Aid, so the problem space for us includes access control and auditing.
Love your work :)
Given their principled take on only trusting full-VM boundaries, I doubt they moved any of the storage stack into the untrusted VM.
So maybe a virtio-block device passing through discard to some underlying CoW storage stack, or maybe virtio-fs if it's running on ch instead of fc? Would be interesting to hear more about the underlying design choices and trade-offs.
Edit: from their website, "Since it's just ext4, you won't run into weird edge cases like you might with NFS or FUSE mounts. You can happily use shared memory files, for example, so you can run SQLite in all its modes." So it's a virtio block device supporting discard that's exposed to the VM. Interesting; fc doesn't support virtio discard passthrough, and support for ch is still in progress...
This alone was worth the upvote!
I'd love to adopt this for all my development (which I currently do using rented cloud instances, so I'm pretty comfortable with the remote development paradigm). I'm especially excited about the snapshot/clone pattern, and have (this past week) been researching solutions for exactly this problem.
Hope you launch multiple regions for this ASAP. Will be watching.
https://container-use.com/quickstart
BTW Simon, I was super happy when I heard on Theo's podcast that he will be encouraging you to monetise your work more. I'm super appreciative of your work and I'm pretty convinced that the more you profit from it, the better the universe will be!!!
We can also attach Macaroons to Fly Machines and Sprites for configurable ambient privileges, something I've wanted us to expose as a feature for a very long time.
Some things that are unclear:
- How should I auth to github? sprite console doesn't use ssh (afaik) so I guess not agent forwarding?
- What on machine api's are available? Can I use the fly oidc provider[1]? There's a /.sprite/api.sock but curl'ing /v1/tokens/oidc gets a 404.
- How much is it going to cost me? I know there is pricing but its hard to figure out what actual usage would be like. Also I don't see any usage info in the webui right now.
I was previously thinking about doing the same thing on my homeserver with tailscale to expose the web interface publicly and tailscale oidc auth to an s3 bucket for object storage.
What is the contract with sprites? Is it just built-with-linux but not promising Linux? Or is it more like a machine but y'all control the container image?
I don't want to get too far into the rest of the details only because I'm writing this up for next week. They're not that interesting technically, but they're a really big deal for us in other ways.
Fly's Sprites.dev addresses dev environment sandboxes and API sandboxes together - >>46561089 - Jan 2026 (10 comments)
We realize that is not going to cover all the business cases we have been discussing with customers and plan to introduce a snapshot concept (in particular for rewinding the state of a VM to an automatic backup), but we have a lot of FS work underway before we can launch it. There are some other things we want out of our VMs that we cannot do using conventional cloud techniques, so we have code to write.
In particular, I'm really excited about the extremely fast start up time and checkpointing. I'm curious if anyone knows any alternatives in this space?
Wait, what?
But maybe you have parts of the stack that don't need to be trusted inside the VM somehow? Looking forward to the article.
You can specify a max exec time for a process when you launch it via the API.
I'd love to be able to configure the base image/VM in a way that doesn't bundle coding tools or anything else I don't need, and comes with some other binaries installed (I'm more interested in using this as an API for a sandbox use-case I have). Is there a way to do this at the moment / is this on the roadmap?
Another option would be configuring the sprite via checkpoint and then cloning the checkpoint from a base sprite, but I don't see this option anywhere either.
Is this just a fancy VPS like digital ocean with, https endpoint, snapshot and restore?
(Same thing goes for exe.dev)
* Near-instant creation
* Automatic spin-down scale-to-zero, so you're not paying for it when it's not in use.
If you're using these like we are internally, you've got like 2 dozen of them sitting around in the background sleeping. They're BIC disposable computers. "When in doubt just make another one."
SQLite works great for my apps. I haven't needed object storage yet, storing files on disk is enough.
Use cases: set up my preferred env in one sprite and use that as a template for others; or fire off a few independent sprites with claude code exploring alternative solutions, then choose a winner and reap the rest.
Also "containers" always had the option to attach durable storage via bind mounts.
I still get confused by the "this isn't containers" but it's kind of similar.
Maybe I am just too caught up in semantics.
A VPS that is instant to boot, super simple automatic routing and https proxy, with snapshot and durable is a win regardless.
and when I spin up a new LXC container cloud-init sets it up with the agents and my repos inside
I actually pushed to include it in the launch release. You'd have to ask Kurt why he didn't, but I think the idea is just to get more real-world usage first.
And then there's just the idea of being able to pull these out of the sky literally whenever you want one. If you want to try something new out real quick, it makes no sense to figure out which of your existing Sprites to use. Just make a new one. If you're a little OCD, like I am, every once in awhile you can go prune, if you really care.
If it helps: Jerome has been working for a couple months on a local, open-source Rust version of Sprites, so you can use the same DX with your own infrastructure. We just think this is the right "shape" for modern sandboxes, wherever you actually run them.
My libvirt setup does this right now, I have a little dumb cli I wrote that lets me create, start, stop, save, restore, and destroy preconfigured machines. I use it for testing provisioning scripts and playbooks. You get the full cloud experience by including a cloud-init ISO so you can ssh to it the moment it boots with my key. Didn't realize I was at the frontier of computing paradigms.
Don't get me wrong the interface fly has is super nice but it feels like the endgame isn't remote hosted computers but a nice user-friendly interface (i.e. what docker did) but it's for persistent local VMs.
This is a large pain point today if you aren't technical, most of the chat interfaces just let you create frontend only apps.
… yes? We have a few wrapper scripts around worktree operations that copy some docker volumes (pg data, bundle cache, etc.) from the base and spins up an entirely new stack on different ports with a host alias. We don’t have to install any deps beyond that because we copied over the ruby gems bundle cache and we’re using Yarn PnP + “zero installs” for client-side deps.
I'm having trouble understanding the difference to Fly machines. If you spin up a Debian container on a machine with a persistent volume, doesn't that have everything this does? Is this about providing a layer of useful configuration/management software on top?
Fo people do this? I’ve never heard of it.
Your pricing looks competitive on compute but roughly 4-5 times more expensive on memory and double on storage.
All the cool technical stuff aside - this, for me, was the standout line of the article
Also check out the 5 min demo we put out where I walk thru some sprite basics: https://www.youtube.com/watch?v=7BfTLlwO4hw
I can't say enough how, if you're using this like Kurt and Chris have been, you have like, a dozen sleeping Sprites in your Sprite list. If you're not doing anything with them, they're not really costing you anything. When you want to do something new, there's no point figuring out which of your existing Sprites to do it on. Just make a new one.
Always having a sane place to run anything I happen to be doing, without making any decisions, it's a weird feeling.
Maybe I’ve been isolated from The World for too long, but this sounds … unhealthy.
Have been experiencing intermittent connection drops as well.
https://sprites.dev/api has this command:
$ curl -X POST "https://api.sprites.dev/v1/sprites" \ -H "Authorization: Bearer $SPRITES_TOKEN" \ -d '{"name": "my-sprite"}'
which responds with
{"error":"name is required"}
if you use the request body in the full "Create Sprite" documentation at https://sprites.dev/api/sprites#create then it does work.
can I live with some rough edges for some personal workflows that only impact me when things break? sure. however, I was thinking about playing with some CI/CD stuff using sprites that would impact our whole team if things broke and I'm really on the fence because of this experience in the first 20 seconds.
Fly team - please put some black box probes or just better testing on the example you give in the quick start. if you document it, test it.
I wish more companies had open issue trackers (some proprietary software have issues on Github for example, but, it doesn't need to be Github, just let people discuss issues in the open)
a "quick start" really should just work when you copy paste them.
API downtime is a semi-frequent occurrence, as are transient API errors and slowness.
I've also had a ticket open with support for weeks due to rampant billing issues. For instance, a destroyed instance still shows up in my usage report as actively accruing billed time, and at a rate faster than is even possible (something like 2 hours for every 1 actual hour that has passed.)
They've released two new products in the AI space, this and Phoenix.new, and my worry is that they are focused on new products over making what they have good and reliable.
Then read Simon Willison's breakdown and got the 'Aha!'.
I like what they've done, played with it and immediately started to plan how I'd try to implement it myself.
I guess this will be the way to go, for development setups instead of using a dedicated machine. Especially when mobile clients are created for Sprites.
As I was reading this I was a bit confused by the issues they mention, but at work I use Claude SSHed to a persistent dev server and I’d be annoyed if I didn’t have eg my git repos there all the time or any part of that workflow was ephemeral. I’m not really aware of what everyone else is doing with sandboxes etc.
But the bit at the end with the MDM server made it click for me. I’ve started generating tiny iOS apps for personal software stuff, because they solve data storage better than the web (at least on iOS). A database on some other server seems like a bad fit/overkill for this stuff, client side storage is too flaky because Safari. But iOS apps are limiting in their own annoying ways compared to web apps.
This looks like a really interesting solution, I can just store the data on a sprite with SQLite or whatever. Visit its URL to use my app, then does it go away on its own after a short time? I could have done that before with a server with storage, but this seems easier/probably cheaper.
If this works well/the way I’m hoping it might be the sweet spot for simple personal software that needs persistent data and you want to run anywhere.
One feature that would make this really nice is if it could have something like Vercel preview environments, where I need to auth my fly account to view the URL. That'd solve the public URL without me needing to do my own auth thing in every app.
That said, I dread having to do anything CLI related, which for hobby projects is like once every few weeks.
Glancing at the docs for Sprite, I worry that this will be another CLI where a good 95% of the time that I go to invoke a command, my workflow is interrupted by an auto-updater that takes longer than whatever interaction I'm trying to do and derails my train of thought.
I had a few issues
1. manpath: can't set the locale; make sure $LC_* and $LANG are correct
suspect this is due to it inheriting locale from my local machine? easy to get around with some updates to .bashrc
2. the $SHELL environment in my sprite is `/opt/homebrew/bin/fish` I use fish on my local (mac + homebrew) machine and it seems to have inherited from my local machine, its nice to be using fish in the sprite, but seems weird that $SHELL in the sprite points to non-existent path. Slightly concerning that a local env var is being transferred to a remote machine without my explicit permission, I have some sensitive env vars locally.
Running IncusOS on some local hardware with ZFS underneath is a phenomenally powerful sandbox.
Thanks! Also looking forward to reading the post :)
> the idea is just to get more real-world usage first
My particular wish notwithstanding, I agree with this.
In terms of actually making the app, I don't know Swift or iOS at all so it's all generated. Usual caveats, and I'm only running them on my own phone. I ask Claude (not code) to help me with the spec, I give it some bullet points and it asks a bunch of clarifying questions then gives me a spec. I put that in a new directory, fire up Claude and use the ralph-loop plugin (https://github.com/anthropics/claude-code/tree/main/plugins/...):
> /ralph-loop:ralph-loop "Implement the iOS app described in app-spec.md. You have access to xcode CLI tools. You should write tests and use them to verify your work. The task will be complete when the app is fully implemented, with all tests passing. Output <promise>COMPLETE</promise> when finished." --max-iterations 50 --completion-promise "COMPLETE"
Once it's done you can open the app in XCode, test it in a simulator, play with it and iterate a bit and then send it to your phone!
Editing to add because I can't edit the original post: I think the limiting factor here might be the concurrent sprites limit. It seems like if you're on pay-as-you-go then you can only have 3 running concurrently, and have to subscribe to get 10.
i am dying to know: firecracker still? I know you have an upcoming post abt it, but i'm incredibly impatient when it comes to fool new infra
> Despite all that, they’re fully durable. They don’t die until I tell them to.
what?
Seems like they are using JuiceFS under the hood, with an overlay root for your CoW semantics. JuiceFS gives them instant clone (because they're not cloning the whole rootfs), while the chnages to the overlay are done as an overlayfs and probably synced back to S3 via a custom block device they have mounted into firecracker.
You can also see they are using juicefs it for the "policy" directly (which I'm assuming is the network policy functionality). iirc juicefs has support for block devices too, so maybe they are using that to back the rootfs overlay.
One concerning thing is the `/var/lib/docker` mount - i ran this in an ubuntu container, did they... attach it? Maybe that's a coincidence, but docker is not installed on the sprite by default. (the terminal is also super busted when used through an ubuntu container)
https://pastebin.com/raw/kt6q9fuA (edit: moved terminal output to pastebin because it was so ugly here)
I played with a similar stack recently, my guess is they are: 1. making some base vm, snapshotting it 2. when you create a vm, they just restore a copy and push metadata to it (probably via one of the mounts) 3. any changes that you make to the rootfs are stored on the juicefs block device (the overlay), which is relatively minimal compared to the base os. JucieFS also supports snapshotting, so that's probably how they support memory + filesystem snapshot and restore so quick
interestingly, seems they provision maybe a max disk size of 100GB for total checkpoints?
```
NAME TYPE SIZE FSTYPE MOUNTPOINTS
loop0 loop 100G /.sprite/checkpoints/active
```
fuse is definitely being used within the VMM, i can see a fuse mount and id being assigned. They're probably using juicefs directly for the policy mount because that doesn't need to be local nvme-cached, just consistent. The local-nvme -> s3 write-through runs on the hypervisor through a custom block device they attach to the firecracker vmm. This might just be the --cache-dir + --writeback cache option in juicefs. Wild guess is just 1 file per block.
guessing the "s3" here is tigris, since fly.io seems to have a relatoinship with them, and that probably keeps latency down for the filesystem
https://github.com/superfly/sprites-js/tree/main/examples https://github.com/superfly/sprites-go/tree/main/examples https://github.com/superfly/sprites-py/tree/main/examples https://github.com/superfly/sprites-ex/tree/main/examples
If the fat bundled environment harmful for you, or just extra stuff you don't care about?
In the longer term, docker is nice from a reproducibility + CI perspective, and a docker build is already something can easily work with and track in my system.
One thing I've heard but not verified with other sandboxed execution providers is that startup times for custom images can be quite slow, so it could be a potential differentiator given Fly's existing infra.
just assume it’s json. you’re gonna parse and validate it anyway.
There goes the neighborhood.
If they hardcode JSON such a change would be breaking for their previous users.
var envVars []string
shellEnvVars := []string{
"BASH_VERSION",
"ZSH_VERSION",
"FISH_VERSION",
"KSH_VERSION",
"tcsh",
"SHELL",
}
It's also reading terminfo. It's not handling absolute paths to shells properly, though.If you want to skip this, running `sprite exec -tty /bin/bash --login` or similar avoids the magic.
As I understand it Unison tries to do something like that but that might be wrong.
The way they force at least one "fuck" into all of their promo is just so cool.
Erm, I mean so fuck cool!
1. Coderunner - https://github.com/instavm/coderunner
npm install @anthropic-ai/sprites
Is there some relationship between Anthropic and Fly.io that I didn't hear about?
I've been wanting to sandbox Copilot/Claude Code for a while, but I don't want to pay for a PaaS just to do that. I want to run the sandbox on my M4 chip instead of needing a constant internet connection to run code on an anaemic remote CPU.
My issue is I've had my work laptop wiped twice because of things I've installed on it and it's a hassle to switch accounts/devices but I've love to give sprites a go.
Additionally, is Tailscale/Wireguard connectivity something you'd consider?
https://www.youtube.com/@t3dotgg/videos
It was in one of his videos from last week.
What I've been waiting for, for a long time. Basically the thing you need if you want agents to run freely but still in a safe way kinda.
>For reasons we’ll get into when we write up how we built these things, you wouldn’t want to ship an app to millions of people on a Sprite. But most apps don’t want to serve millions of people. The most important day-to-day apps disproportionately won’t have million-person audiences.
I appreciate a lot this vision of personal computing.
I'll give sprites a try, they sound super cool.
Here's the repo [1]. I modified it a bit to post publicly and remove the details of my setup within my tailnet/flycast network.
[0] >>46605155