But as I said, slow.
Sure they are. You could generate every possible input, compute hash & compare with a given one.
Ok it might take infinite amount of compute (time/energy). But that's just a technicality, right?
Depends entirely on what you mean by reversible. For every hash value, there are an infinite number of inputs that give that value. So while it is certainly possible to find some input that hashes to a given value, you cannot know which input I originally hashed to get that that value.
I think you'd have to compare the data value before purging, and you can only do the deduplication (purge) if the block is actually the same, otherwise you have to keep the block (you can't replace it with the hash because the hash link in the pool points to different data)
When using MD5 (128bit) then when AWS S3 would apply this technique, it would only get a handful of collisions. Using 256bit would drive that down to a level where any collision is very unlikely.
This would be worth it if a 4kb block is, on average, duplicated with a chance of at least 6.25%. (not considering overhead of data-structures etc.)