I'd like to see a formal container security grade that works like:
1) Curate a list of all known (container) exploits
2) Run each exploit in environments of increasing security like permissions-based, jail, Docker and emulator
3) The percentage of prevented exploits would be the score from 0-100%
Under this scheme, I'd expect naive attempts at containerization with permissions and jails to score around 0%, while Docker might be above 50% and Microsandbox could potentially reach 100%.This might satisfy some of our intuition around questions like "why not just use a jail?". Also the containers could run on a site on the open web as honeypots with cash or crypto prizes for pwning them to "prove" which containers achieve 100%.
We might also need to redefine what "secure" means, since exploits like Rowhammer and Spectre may make nearly all conventional and cloud computing insecure. Or maybe it's a moving target, like how 64 bit encryption might have once been considered secure but now we need 128 bit or higher.
Edit: the motivation behind this would be to find a container that's 100% secure without emulation, for performance and cost-savings benefits, as well as gaining insights into how to secure operating systems by containerizing their various services.
I think it's generally understood that any sort of kernel LPE can potentially (and therefore is generally considered to) lead to breaking all security boundaries on the local machine, since the kernel contains no internal security boundaries. That includes both containers, but also everything else such a user separation, hardware virtualization controlled by the local kernel, and kernel private secrets.
There is no inherent advantage to virtualization, the only thing that matters is the security and robustness of the privileged host.
The only reason there is any advantage in common use is that the Linux Kernel is a security abomination designed for default-shared/allow services that people are now trying to kludge into providing multiplexed services. But even that advantage is minor in comparison to modern, commonplace threat actors who can spend millions to tens of millions of dollars finding security vulnerabilities in core functions and services.
You need privileged manager code that a highly skilled team of 10 with 3 years to pound on it can not find any vulnerabilities in to reach the minimum bar to be secure against prevailing threat actors, let alone near-future threat actors.
The advantage to virtualization is that the syscall interface is being implemented by the guest kernel at a lower privilege level instead of the host kernel at a higher privilege level.