Image Scrubber: tool for anonymizing photographs taken at protests

>>dsr12+(OP)
I recently found myself in a position where I had to blur a ton of faces from multiple pictures (about 100/day).

It’s really tedious to do it manually and something like OpenCV shines.

We found a repo [1] with python code that automatically detects and blurs faces. This script was one of many, except it had a very high accuracy. Over 90%.

Removing exif data is a great idea.

[1] github.com/telesoho/faceblur

>>shivek+Q2
I’m reminded of a reddit thread a while back about the US government paying a large sum to create an “unblur” function for photoshop. Someone in the comments was able to rotate and flip a photo and use the photoshop blur tool to effectively undo a blur for free.

Perhaps it’s better to remove the section of photo with a person’s face instead? Or draw a shape over their face and flatten the image? It seems to me as long as the pixels are there the identifying data is there for anyone willing to spend the time and effort to find it.

Edit: Apparently it was interpol, not the US government. I can't find the reddit thread but here's a NYT article with the photo: https://thelede.blogs.nytimes.com/2007/10/08/interpol-untwir...

>>elliek+05
That wasn't really a "blur", though. A swirl like that is just moving pixels around. If know how the algorithm works, you can reverse it. Probably, it was done with a common program like Adobe Photoshop or Gimp or something. One could write a program that would just "unswirl" with various parameters and generate a bunch of images, and a human could pick out the one that looks like an unswirled image. If you can pick out the right paramaters to the unswirl, then no image information is lost.

That can't be done with a blur. In a blur, pixels are merged or averaged together and information is lost. In some cases you could sharpen it a little, but it's still not going to be as good as the original image. In a really good blur, even the best sharpen algorithm isn't going to give you something that looks like an identifiable face.

>>siberi+4c
> That can't be done with a blur. In a blur, pixels are merged or averaged together and information is lost.

I'd be careful with that assumption. The only thing that really loses information is the discretization back into 0-255 range, and that naturally loses very little information.

If you consider the pixels as a large vector of values, you're effectively multiplying it by a matrix (plus discretization afterwards). If that matrix has (near) full rank, you can restore (close to) all the information.

Consider an grayscale image with two pixels a = 10, b = 20. I apply a blur that transfers 10% of each pixel to the other one. I end up with 11, 19. I'm left with the information 0.9 a + 0.1 b = 11, 0.1 a + 0.9 b = 19. Clearly this system can be solved uniquely. Or equivalently, the blur matrix (and I don't mean the kernel but the full blur operation matrix) is [[ 0.9 0.1 ] [ 0.1 0.9]], which has full rank and is thus invertible.

You'd be surprised at the amount of image detail that can be recovered by filtering when the original distortion function is known. See also https://en.wikipedia.org/wiki/Deconvolution and the lower half of that page's "See also" links section.

>>Mauran+gm
So it would seem we should use a 0.5x + 0.5y blur to be sure to lose information - something that makes the matrix close to singular.

Also, how to handle the boundaries? We select a box in the image and blur that; we'd want to handle the boundaries in a way that also makes sure we lose information.

>>kzrdud+VP
Yes, you could set each pixel in the blurred region to the same color (i.e. each matrix entry is identical) - that could be black or the average of the blurred region. "Pixelation" does the same but in smaller boxes.

zlacker