I'm not familiar with the boolean circuit problem, but I wonder if it's an instance where the NP hardness comes from specific edge cases, and whether your experiment tested said edge cases. Compare with the fact that the C++ compiler is Turing complete: its Turing completeness arises from compiling extremely contrived bizzarro code that would never come up in practice. So for everyday code, humans can answer the question, "Will the C++ compiler enter an infinite loop when it tries to compile this code?", quite easily, just by answering "No." every time. That doesn't mean humans can solve the halting problem, though.