zlacker

[return to "Algorithm can pick out almost any American in supposedly anonymized databases"]
1. rectan+l4[view] [source] 2019-07-24 10:32:16
>>zoobab+(OP)
The great contribution of Differential Privacy theory is to quantify just how little use you can get out of aggregate data before individuals become identifiable.

Unfortunately, Differential Privacy proofs can be used to justify applications which turn out to leak privacy when the proofs are shown to be incorrect after the fact, when the data is already out there and the damage already done.

Nevertheless, it is instructive just to see how perilously few queries can be answered before compromise occurs — putting the lie to the irresponsible idea of "anonymization".

◧◩
2. specia+wl[view] [source] 2019-07-24 13:10:43
>>rectan+l4
The part I can't wrap my head around is how to mitigate (future proof) unforeseen leakage and correlations. The example I keep going back to is deanonymizing movie reviews (chronologically correlating movie rentals and reviews). And, frankly, I'm just not clever enough to imagine most attacks.

If nothing else, I appreciate the Differential Privacy effort, if only to show the problem space is wicked hard.

I worked in medical records and protecting voter privacy. There's a lot of wishful thinking leading to unsafe practices. Having better models to describe what's what would be nice.

◧◩◪
3. bo1024+3x[view] [source] 2019-07-24 14:23:13
>>specia+wl
> The part I can't wrap my head around [...] The example I keep going back to is deanonymizing movie reviews

The reason is that you are thinking of an example that's not nicely compatible with differential privacy. The basic examples of DP would be something like a statistical query: approximately how many people gave Movie X three stars? You can ask a bunch of those queries, adding some noise, and be protected against re-identification.

You can still try to release a noisy version of the whole database using DP, but it will be very noisy. A basic algorithm (not good) would be something like

    For each entry (person, movie):
      with probability 0.02, keep the original rating
      otherwise, pick a rating at random
(A better one would probably compute a low-rank approximation, then add small noise to that.)
[go to top]