zlacker

[parent] [thread] 7 comments
1. tialar+(OP)[view] [source] 2019-07-24 10:36:35
If people tell you they're collecting data for statistical purposes, then one of three things:

1. They should deliberately introduce noise into the raw data. Nazis with the raw census data can spend all month trying to find the two 40-something Jews that data says live on this island of 8400 people, but they were just noise. Or were they? No way to know.

2. Bucket everything and discard all raw data immediately. This hampers future analysis, so the buckets must be chosen carefully, but it is often enough for real statistical work, and often you could just collect data again later if you realise you needed different buckets.

3. They shouldn't collect _anything_ personally identifiable. Hard because this could be almost anything at all. If you're 180cm tall your height doesn't seem personally identifiable, but ask Sun Mingming. If you own a Honda Civic then model of car doesn't seem personally identifiable but ask somebody in a Rolls Royce Wraith Luminary...

replies(3): >>pas+g1 >>GhostV+66 >>MaxBar+Ys1
2. pas+g1[view] [source] 2019-07-24 10:52:33
>>tialar+(OP)
Nazis used the census because it was there. Were it not, they would have just went house to house - which they then regularly did anyway.

The antidote of oppression is not the Index statisticum prohibitorum, but quite the opposite, education, and in particular educating about how different each and every one of us is, and yet it doesn't take much to get along well.

replies(1): >>pjc50+M1
◧◩
3. pjc50+M1[view] [source] [discussion] 2019-07-24 10:59:11
>>pas+g1
Today's Nazis would probably use machine learning to identify "characteristically Jewish facial features".

(Or, of course, simply pass a law that entitles them to deport or hold indefinitely anyone who can't prove they're a real Ayrian with only the papers they have on them at the time)

replies(2): >>wayout+y6 >>tal8d+s8
4. GhostV+66[view] [source] 2019-07-24 11:46:48
>>tialar+(OP)
> They shouldn't collect _anything_ personally identifiable

Why not just ensure that any personally identifiable data is properly bucketed, and discarded if it is too strongly identifiable. If you are storing someone's height, age, and gender, you can just increase the bucket size for those fields until every combination of identifiable fields occurs several times in the dataset. If there are always a few different records with well distributed values for every combination of identifiable fields, you can't infer anything about an individual based on which buckets they fall into.

replies(1): >>majos+D7
◧◩◪
5. wayout+y6[view] [source] [discussion] 2019-07-24 11:50:12
>>pjc50+M1
Probably the latter, considering any sort of genetic or scientific testing that was reasonably accurate at identifying people with Jewish ancestry would also implicate much of German high command (including Hitler).

Remember; fascists don't believe in things because they are true, but because they are a means to an end. Their ultimate goal is authoritarian control, and an administrative mechanism is far more effective toward that goal than a scientific one.

◧◩
6. majos+D7[view] [source] [discussion] 2019-07-24 12:01:55
>>GhostV+66
Not a bad idea! It sounds pretty similar to k-anonymity [1], which is not a terrible privacy heuristic. But it does have some specific weaknesses. Wikipedia has a good description.

> Homogeneity Attack: This attack leverages the case where all the values for a sensitive value within a set of k records are identical. In such cases, even though the data has been k-anonymized, the sensitive value for the set of k records may be exactly predicted.

> Background Knowledge Attack: This attack leverages an association between one or more quasi-identifier attributes with the sensitive attribute to reduce the set of possible values for the sensitive attribute.

Optimal k-anonymization is also computationally hard [2].

[1] https://en.wikipedia.org/wiki/K-anonymity

[2] https://dl.acm.org/citation.cfm?id=1055591

◧◩◪
7. tal8d+s8[view] [source] [discussion] 2019-07-24 12:09:11
>>pjc50+M1
This hypothetical 21st century National Socialist German Workers' Party would likely be a little more subtle than blatantly enforcing existing immigration law. They'd obviously be interested in establishing a common enemy, even if it requires massive exaggeration mixed with complete fabrication. That would do them no good if domestically targeted propaganda were illegal though... hopefully the combination of laws and executive orders preventing that don't get relaxed by the two administrations proceeding this cringy edgelord scenario.
8. MaxBar+Ys1[view] [source] 2019-07-24 20:42:12
>>tialar+(OP)
To points 1 and 2: It's proven very difficult to sanitize datasets in a way that ensures anonymity, but doesn't render it useless. You aren't the first to think of these kinds of transformations.

There are problems with Point 3: we're continually surprised with how effectively smart people can identify people in datasets expected to be 'safe'. You've also not accounted for that a collection of non-identifying attributes may become identifying.

That said, the GDPR is largely about prohibiting unnecessary data collection, in the spirit of Point 3. Hopefully it'll help at least a little.

[go to top]