Though abs() returning negative numbers is hilarious.. “You had one job…”
To me, the hardest bugs are nearly irreproducible “Heisenbugs” that vanish when instrumentation is added.
I’m not just talking about concurrency issues either…
The kind of bug where a reproduction attempt takes a week, not parallelizable due to HW constraints, and logging instrumentation makes it go away or fail differently.
2 days is cute though.
Part of it was difficulty of pinpointing the actual issue - fullness of drive vs throughput of writes.
A lot of it was unfortunately organizational politics such that the system spanned two teams with different reporting lines that didn't cooperate well / had poor testing practices.