Curious about the hash collision rate in practice. The README mentions 154K+ entries from Diablo II patches. With that sample size, have you encountered meaningful false positives where structurally similar but semantically different functions matched? The Version Tracker comparison in the comments is fair — seems like combining this hash approach with additional heuristics (xref patterns, call graph structure) could reduce both false positives and negatives.
The headless Docker mode is a nice touch for CI integration. Being able to batch-analyze binaries and auto-propagate annotations without spinning up a GUI opens up some interesting automated diffing workflows.