Microsandbox: Virtual Machines that feel and perform like containers

I'd like to see a formal container security grade that works like:

  1) Curate a list of all known (container) exploits
  2) Run each exploit in environments of increasing security like permissions-based, jail, Docker and emulator
  3) The percentage of prevented exploits would be the score from 0-100%

Under this scheme, I'd expect naive attempts at containerization with permissions and jails to score around 0%, while Docker might be above 50% and Microsandbox could potentially reach 100%.

This might satisfy some of our intuition around questions like "why not just use a jail?". Also the containers could run on a site on the open web as honeypots with cash or crypto prizes for pwning them to "prove" which containers achieve 100%.

We might also need to redefine what "secure" means, since exploits like Rowhammer and Spectre may make nearly all conventional and cloud computing insecure. Or maybe it's a moving target, like how 64 bit encryption might have once been considered secure but now we need 128 bit or higher.

Edit: the motivation behind this would be to find a container that's 100% secure without emulation, for performance and cost-savings benefits, as well as gaining insights into how to secure operating systems by containerizing their various services.

>>zackmo+EL
The issue, at least with multitenant workloads, isn't "container vulnerabilities" as such; it's that standard containers are premised on sharing a kernel, which makes every kernel LPE a potential container escape --- there's a long history of those bugs, and they're only rarely flagged as "container escapes"; it's just sort of understood that a kernel LPE is going to break containers.

>>tptace+JR
> it's just sort of understood that a kernel LPE is going to break containers.

I think it's generally understood that any sort of kernel LPE can potentially (and therefore is generally considered to) lead to breaking all security boundaries on the local machine, since the kernel contains no internal security boundaries. That includes both containers, but also everything else such a user separation, hardware virtualization controlled by the local kernel, and kernel private secrets.

>>delusi+iR1
A large proportion of LPE vulnerabilities are in the nature of "perform a syscall to pass specially crafted data to the kernel and trigger a kernel bug". For containers, the kernel is the host kernel and now the host is compromised. For VMs, the kernel is the guest kernel and now the guest is compromised, but not the host. That's a much narrower compromise and in security models where root on the guest is already expected to be attacker-controlled, isn't even a vulnerability.

>>zrm+Ms2
VM sandbox escape is just "perform a hypercall/trap to pass specially crafted data to the hypervisor and trigger a hypervisor bug". For virtual machines, the hypervisor is the privileged host and now the host is compromised.

There is no inherent advantage to virtualization, the only thing that matters is the security and robustness of the privileged host.

The only reason there is any advantage in common use is that the Linux Kernel is a security abomination designed for default-shared/allow services that people are now trying to kludge into providing multiplexed services. But even that advantage is minor in comparison to modern, commonplace threat actors who can spend millions to tens of millions of dollars finding security vulnerabilities in core functions and services.

You need privileged manager code that a highly skilled team of 10 with 3 years to pound on it can not find any vulnerabilities in to reach the minimum bar to be secure against prevailing threat actors, let alone near-future threat actors.

>>Veserv+zH2
The syscall interface has a lot more attack surface than the hypercall interface. If you want to run existing applications, you have to implement the existing syscall interface.

The advantage to virtualization is that the syscall interface is being implemented by the guest kernel at a lower privilege level instead of the host kernel at a higher privilege level.

zlacker