zlacker

The part I can't wrap my head around is how to mitigate (future proof) unforeseen leakage and correlations. The example I keep going back to is deanonymizing movie reviews (chronologically correlating movie rentals and reviews). And, frankly, I'm just not clever enough to imagine most attacks.

If nothing else, I appreciate the Differential Privacy effort, if only to show the problem space is wicked hard.

I worked in medical records and protecting voter privacy. There's a lot of wishful thinking leading to unsafe practices. Having better models to describe what's what would be nice.

replies(1): >>bo1024+xb

>>specia+(OP)
> The part I can't wrap my head around [...] The example I keep going back to is deanonymizing movie reviews

The reason is that you are thinking of an example that's not nicely compatible with differential privacy. The basic examples of DP would be something like a statistical query: approximately how many people gave Movie X three stars? You can ask a bunch of those queries, adding some noise, and be protected against re-identification.

You can still try to release a noisy version of the whole database using DP, but it will be very noisy. A basic algorithm (not good) would be something like

    For each entry (person, movie):
      with probability 0.02, keep the original rating
      otherwise, pick a rating at random

(A better one would probably compute a low-rank approximation, then add small noise to that.)