zlacker

[return to "Algorithm can pick out almost any American in supposedly anonymized databases"]
1. rzwits+e6[view] [source] 2019-07-24 10:58:05
>>zoobab+(OP)
I'm a programmer in the GP data analysis world. We use the term 'pseudonymization' for this kind of data. 'Anonymization' is used solely to refer to, say, 'the sum total of diabetes patients this practice has' (that would be anonymous patient data; it would not be anonymous relative to the GP office this refers to): Aggregated data that can no longer be reduced to a single individual at all.

The term raises questions: Okay, so, what does it mean? How 'pseudo' is psuedo? And that's the point: When you pseudonimize data, you must ask those questions and there is no black and white anymore.

My go-to example to explain this is very simple: Let's say we reduce birthdate info to just your birthyear, and geoloc info to just a wide area. And then I have an pseudonimized individual who is marked down as being 105 years old.

Usually there's only one such person.

I invite everybody who works in this field to start using the term 'pseudonimization'.

◧◩
2. mgkims+V6[view] [source] 2019-07-24 11:08:20
>>rzwits+e6
I worked on a reporting/dataviz project a few years back, showing survey results from states/districts/schools. All the data was included in the totals, but if a school had 5 or fewer responses, we didn't allow viewing of the school data. It was specifically because being able to see answers to things like "do you support your principal?" when there were only, say, 3 teachers at a school, and seeing 2 for "no"... it was way too easy to determine who the respondents were. Even with '5' as a cutoff, it still felt a bit dicey for some schools.
[go to top]