Individual anonymity in a technical sense is impossible in an environment with network connected sensors. Above a critical mass of sensors, which we have far exceeded in most urbanized areas, there are no technical measures that can keep a person from being tracked.
This does not seem that surprising or a new technological development.
Some of those IP addresses are static IP addresses, how could you tell whose?
Some of those URLs may end up with some form of PII in the query string from some forgotten backend service
Some of those user agents ended up having a kaspersky unique identifier
someone saves your website and when they open it some tag re-fires capturing they opened from /Users/John.Smith/yourpage.html
Facebook and google and others add link decoration tracking in the url, and suddenly a unique identifier appears across various hits, even if it wasn't added by the site owner.
There may be account identifiers, hashes or tokens linked to emails, phone numbers. So while the log dataset is market as low risk if lost losing some of the mapping tables at any point in the future would turn it into full PII.
I think the gap for most people is not the existence of these sensors, which capture nothing about a person in any kind of direct way, or that people perturb their environment in some abstract way, but the existence of analytic techniques that allow someone to reconstruct detailed personal information from large collections of extremely oblique measurements of the broader environment.
The analytic methods for doing this type of reconstruction are quite clever and non-obvious, which I guess would need to be the case for it to be surprising. It is nothing at all like typical web or enterprise analytics -- you are using measured physics and constraints on that physics to infer environmental dynamics that you can't measure directly.
The problem is that most data surveillance systems store the raw source data instead of just keeping metrics in aggregate form. Thus, it's almost always possible to de-anonymize data.
There are ways to create anonymous datasets, but they generally involve aggregation, not just removing the identifiers.
It's unfortunate that general lay person understanding of the concepts at work here doesn't tend to extend to this distinction. It would help drive privacy conversations if this were more commonly understood.
> It isn’t all bad news. These same reidentification techniques were used by journalists working at the New York Times earlier this year to expose Donald Trump’s tax returns from 1985 to 1994.
Flippant comments like that make it hard to take the authors seriously. Their concern for privacy apparently evaporates when the techniques are applied against people they don't like.
https://news.ycombinator.com/item?id=20513521 (261 points/94 comments)
Things such as conflicts of interest, crimes, and lies about where/how they got their money and whether they're really as wealthy as they claim to be, whether they've cheated on their taxes or paid unfairly low taxes considering their enormous wealth are all things that could influence these critically important decisions on the part of the public.
A further argument is that officials serving in public office don't have the same expectation of privacy that private citizens do.
In view of these two arguments and others it's not difficult to see why the authors of this article need not consider the revelation of Trump's tax returns a good thing merely because they don't like him.
Further, there is no evidence in the article that its authors would not be concerned about the privacy rights of other people they don't like who aren't: 1 - the President of the US, and 2 - not public officials.
Up until the point that access to Trump's tax returns was mentioned the article was warning about the false privacy associated with anonymizing identity.
I can understand the argument that candidates should reveal their financial history. But that doesn't mean otherwise reasonable concerns about false anonymity should be suspended when talking about the anonymity of one particular person who has explicitly asserted their privacy rights.
Even if you think the authors were making a more general statement about all candidates and not just Trump, that seems like a terrible argument to me. In the cases of candidates for office, voters are free to penalize candidates who don't reveal enough information about themselves by not voting for them. There is no need to soften any privacy concerns about anonymized identities.
Compare these two hypothetical scenarios:
1 - Voters don't have access to the candidate's tax records
2 - Due to released tax records, the voters know for certain all of the below facts about the candidate: A - The candidate paid no taxes, B - The candidate cheated on their taxes, C - The candidate is not as rich as they claim to be, D - The candidate's businesses lost money so they're not as good a businessperson as they claim to be
In the first hypothetical scenario the voters the voters know there's a possibility that the candidate might be hiding something, in the second hypothetical scenario the voters know for certain that the candidate is a lawbreaking, tax cheating, lying hypocrite.
In which of these hypothetical scenario do you think the voters are going to penalize the candidate more?
A released tax records B released tax records C didn't release any records
FWIW, I've talked to an accountant about the idea of Trump revealing his tax records and the bottom line is that they would be sufficiently complicated that there is no possibility that the average person would be able to interpret them accurately, so you'll be left with the spin from the various media organizations, hardly a source of objective truth.
So I would assert that requiring candidates to release their tax records doesn't actually provide any useful information for a voter.
Remember that Trump's tax records, are already examined by the IRS and I believe have been audited. So there shouldn't be any question of illegal activity being hidden, unless you want to assert that the IRS can't be trusted either.
There are also other concerns about tax records revealing information about 3rd parties. And finally tax records aren't really a useful way to understand the intricacies of a business. If you are really interested in that you would want the audit report for the underlying business and not just tax records.