zlacker

[parent] [thread] 0 comments
1. bo1024+(OP)[view] [source] 2019-07-24 14:23:13
> The part I can't wrap my head around [...] The example I keep going back to is deanonymizing movie reviews

The reason is that you are thinking of an example that's not nicely compatible with differential privacy. The basic examples of DP would be something like a statistical query: approximately how many people gave Movie X three stars? You can ask a bunch of those queries, adding some noise, and be protected against re-identification.

You can still try to release a noisy version of the whole database using DP, but it will be very noisy. A basic algorithm (not good) would be something like

    For each entry (person, movie):
      with probability 0.02, keep the original rating
      otherwise, pick a rating at random
(A better one would probably compute a low-rank approximation, then add small noise to that.)
[go to top]