Broken Proofs and Broken Provers

>>RebelP+(OP)
I'm a bit skeptical about taking the rates of reported soundness bugs between different systems and drawing conclusions about the underlying approaches. There's typically a class of bugs that users can stumble into by doing something unusual, and then another class of bugs that can only really be found by exploiting holes in the implementation. The first class depends on which features get exercised the most, the second class depends on how many eyes are on the source (and how accessible it is), and both classes heavily depend on the overall size of the userbase.

E.g., Metamath is designed to be as theoretically simple as possible, to the point that it's widely considered a toy in comparison to 'serious' proof systems: a verifier is mainly just responsible for pushing around symbols and strings. In spite of this simplicity, I was able to find soundness bugs in a couple major verifiers, simply because few people use the project to begin with, and even fewer take the time to pore over the implementations.

So I'd be hesitant to start saying that one approach is inherently more or less bug-prone than another, except to be slightly warier in general of larger or less accessible kernels.

>>Legion+UT
LCF-style provers like Isabelle/HOL and HOLlight are some of the most widely used, and oldest interactive theorem provers. If they consistently show smaller error rates than other systems, that is an interesting empirical observation. To give but one recent example: Amazon recently announced a vast 260000 lines of Isabelle/HOL-checked correctness proof of their new Nitro hypervisor for AWS EC2 Graviton5 instances.

LCF-style provers have a much smaller trusted computing base than Curry/Howard based provers like Coq, Agda and Lean.

One may wonder if there is a correlation between size of TCB and error rate in widely used provers?

>>Gregar+Id1
> LCF-style provers have a much smaller trusted computing base than Curry/Howard based provers like Coq, Agda and Lean.

I'm not sure that this is correct. The TCB in a CH based prover is just the implementation of the actual kernel. In LCF, you also have to trust that any tactics are implemented in a programming language that doesn't allow you to perform unsound operations. That's a vast expansion in your TCB. (You can implement LCF-like "tactics" in a CH-based prover via so-called reflection that delegates the proof to a runtime computation, but you do have to prove that your computation yields a correct decision for the problem.)

zlacker