zlacker

[parent] [thread] 0 comments
1. JimmyR+(OP)[view] [source] 2019-11-02 17:44:57
Lets say with web-logs. What is "Anonymized"? Even when the datasets get "anonymized", at scale, information leaks.

Some of those IP addresses are static IP addresses, how could you tell whose?

Some of those URLs may end up with some form of PII in the query string from some forgotten backend service

Some of those user agents ended up having a kaspersky unique identifier

someone saves your website and when they open it some tag re-fires capturing they opened from /Users/John.Smith/yourpage.html

Facebook and google and others add link decoration tracking in the url, and suddenly a unique identifier appears across various hits, even if it wasn't added by the site owner.

There may be account identifiers, hashes or tokens linked to emails, phone numbers. So while the log dataset is market as low risk if lost losing some of the mapping tables at any point in the future would turn it into full PII.

[go to top]