zlacker

We've been tasked by a client for 2 years to create an anonymized survey, and my mind has gone to great lengths to devise a survey where even our own employees (or superusers with full DB access) cannot figure out who a respondent is.

It's been a fun exercise in software architecture. Because I actually care about this.

But we keep pushing this annual survey another year since we never seem to be ready to actually implement it (due to other priorities)

replies(4): >>mschus+0c >>al_bor+Rr >>daniel+HB >>alexjp+QV

>>atonse+(OP)
There's commercial service providers and open-source projects doing that already.

The thing is, as soon as you allow free-text entry, the exercise becomes moot assuming you got a solid training corpus of emails to train an AI on - basically the same approach that Wikipedia activists used to do two decades ago to determine "sockpuppet" accounts.

replies(2): >>nights+Ne >>kevin_+Fz

>>mschus+0c
unless you add a step where you ask an ai to paraphrase what this message is about.

replies(1): >>mschus+el

>>nights+Ne
Good point, but also liable to get crucial informations and details lost or, worse, completely misunderstood by an AI which by definition lacks contextual knowledge.

>>atonse+(OP)
I built a suggestion box for a team at work like this. It was pretty basic. The page had no login, and no tracking of any kind. The DB only had an index, the date, and the suggestion. The source was available to everyone who would use it, and if they wanted I would have shown them the DB. These people also had root access to the server it ran on, so if they were really paranoid they could clear any system logs. The site was also heavily used for the day to day work, so the noise from everyone on the page would obscure any ability to tie a single IP to a time stamp without a lot of effort and a large chance for error.

Over the course of 4 years I think it was only used 3 times. Most people assumed it was some kind of trap. It wasn’t, I genuinely wanted honest feedback, and thought some people were too shy to speak up in a group setting, so wanted to give options.

replies(1): >>JohnFe+Gf3

>>mschus+0c
Just run it through encheferizer.

replies(1): >>Walter+DL

>>atonse+(OP)
I have a few friends working at CultureAmp (who - amongst other things - do anonymous employee surveys).

Management can 'drill down' to get information on how specific teams responded.

One of the things they mentioned doing is using a statistical (differential privacy?) model to limit the depth, to prevent any specific persons responses being revealed unless it was shared with a substantial number of other responses.

Surprisingly difficult when you consider e.g. a team lead reading a statement like "of the 10 people in your team, one is highly dissatisfied with management" - they have personal knowledge of the situation and are going to know which person it is.

replies(1): >>gblarg+Qd1

>>kevin_+Fz
Zee theeng is, es suun es yuoo elloo free-a-text intry, zee ixerceese-a becumes muut essoomeeng yuoo gut a suleed treeening curpoos ooff imeeels tu treeen un EI oon - beseecelly zee seme-a eppruech thet Veekipedia ecteefists used tu du tvu decedes egu tu determeene-a "suckpooppet" eccuoonts. Bork Bork Bork!

>>atonse+(OP)
When I was in high school I worked at the helpdesk for a small defense contractor. The developers there spent their down time building internal use IT tools. In those days they still wrote a lot of stuff in Lotus Domino, a tool that let you use a Notes database as the back-end for a SSR web app. Our ticketing system was written with it.

They later decided to adopt it for an annual IT satisfaction survey that they sent out to users. In an ideal world we wouldn't participate because the respondents were grading my team's performance but we got invites because we were part of the Exchange distro the message was sent to. I quickly discovered that the dev team had left a bunch of default routes enabled so we were able to view a list of all responses and see who submitted which. We knew our customers well enough that we could reliably attribute most of the negative responses via the free-text comments field anyhow but the fact that anybody could explicitly see everybody else's response wasn't great.

I suppose the NTLM-authenticated username in the server logs would convey the same info but at least that'd require CIFS/RDP access to the web server...

>>daniel+HB
Couldn't this be lessened by intentionally introducing false information, e.g. specifying that 10% of the time the response will be randomized?

>>al_bor+Rr
> Most people assumed it was some kind of trap.

In most of the places I've worked, I would have assumed the same.

The thing is that there is no real technological solution that would instill trust in someone that doesn't already have trust. In the end, all such privacy solutions necessarily must boil down to "trust us" because it's not practical or reasonable to perform the sort of deep analysis that would be required to confirm privacy claims.

You may have provided the source, for instance, but that doesn't give reassurance that the binary that is executing was compiled from that source.