The term raises questions: Okay, so, what does it mean? How 'pseudo' is psuedo? And that's the point: When you pseudonimize data, you must ask those questions and there is no black and white anymore.
My go-to example to explain this is very simple: Let's say we reduce birthdate info to just your birthyear, and geoloc info to just a wide area. And then I have an pseudonimized individual who is marked down as being 105 years old.
Usually there's only one such person.
I invite everybody who works in this field to start using the term 'pseudonimization'.
> My go-to example to explain this is very simple: Let's
> say we reduce birthdate info to just your birthyear,
> and geoloc info to just a wide area. And then I have
> an pseudonimized individual who is marked down as
> being 105 years old.
> Usually there's only one such person.
I was interested to find that HIPAA's requirements for de-identification address the two particular issues you pointed out. First, age above some threshold (90) must be bucketed together as "older than 90." Second, regarding ZIP codes: you must zero out the last two digits. And then, if the resulting identifier contains less than 20,000 inhabitants according to the most recent US census, you have to blank the first three digits as well (there are currently 17 such three-digit prefixes).Source: Pages 96-97 of the combined legislation, available at: https://www.hhs.gov/hipaa/for-professionals/privacy/laws-reg...
You are allowed to roll your own de-identification method, as long as the person doing so is an expert on statistics and de-identification and they document their analysis of why their method is sound. To my knowledge, most entities use the "safe harbor" approach of wiping any data in the legislated blacklist of dimensions.
Let say that you got 20,000 inhabitants, you'll only need about 14 fields that are binary, much less fields if they are not binary (which is quite likely to happen). You'll most likely already got the gender... Even if you limit the age to 10 possible values, that's equivalent to 3 binary fields!