But there's a lot hidden in "same inputs", because that includes everything that's an input to your program from the operating system. Which includes things like "time" (bane of reproduction), memory layout, execution scheduling order of multithreaded code, value of uninitialized memory, and so on.
> Another approach would have been to tweak their test case until they found a situation which reproduced the bug more or less often, trying to find the threshold that causes it and continuing to deduce from there.
Yes - when dealing with unknowns in a huge problem space it can be very effective to play hotter-colder and climb up the hill.