Algorithm can pick out almost any American in supposedly anonymized databases

>>zoobab+(OP)
Would differential privacy fix this problem? I heard that new US census will use it.

>>polski+t7
Yes, in the sense that the output of a differentially private protocol has mathematical guarantees against re-identification, regardless of the computational power or side information an adversary has.

There are caveats. The exact strength of the privacy guarantee depends on the parameters you use and the number of computations you do, so simply saying "we use a differentially private algorithm" doesn't guarantee privacy in isolation.

>>majos+k8
do you have some examples?

>>shusso+x9
Of a differentially private algorithm? Frank McSherry (one of the authors of the original differential privacy paper) has a nice blog post introducing the idea and giving many examples with code [1].

Or even more briefly, if you want to know how many people in your database have characteristic X, you can compute that number and add Laplace(1/epsilon) noise [2] and output the result. That's epsilon-differentially private. In general, if you're computing a statistic that has sensitivity s (one person can change the statistic by at most s), then adding Laplace(s/epsilon) noise to the statistic makes it epsilon-differentially private (see e.g. Theorem 3.6 here [3]). The intuition is that, by scaling the added noise to the sensitivity, you cover up the presence or absence of any one individual.

[1] https://github.com/frankmcsherry/blog/blob/master/posts/2016...

[2] https://en.wikipedia.org/wiki/Laplace_distribution

[3] http://cis.upenn.edu/~aaroth/privacybook.html

zlacker