We reclaim memory with a memory balloon device, for the disk trimming we discard (& compress) the disk, and for i/o speed we use io_uring (which we only use for scratch disks, the project disks are network disks).
It's a tradeoff. It's more work and does require custom implementations. For us that made sense, because in return we get a lightweight VMM that we can more easily extend with functionality like memory snapshotting and live VM cloning [1][2].
[1]: https://codesandbox.io/blog/how-we-clone-a-running-vm-in-2-s...
[2]: https://codesandbox.io/blog/cloning-microvms-using-userfault...
> i/o speed we use io_uring
custom io_uring based driver for the VM block devices? or what do you mean here?