This is exactly what i am building for a friend in a semi amateur fashion with LLMs. Looking at your codebase I would probably end up with something very similar in 6 months. You even have an Air toml and use firecracker, not to mention using go. Great minds think alike I suppose :D. Mine is not for AI but for running unvetted data science scripts. Simple stuff mostly. I am using rootless podman (I think you are using docker? or perhaps packer which is a tool i didn't know about until now.) to create the microvm images and the images have no network access. We're creating a .ext4 disk image to bring in the data/script.
I think I might just "take" this if the resource requirements are not too demanding. Thanks for sharing. Do you have docs for deploying on bare metal?