It’s really tedious to do it manually and something like OpenCV shines.
We found a repo [1] with python code that automatically detects and blurs faces. This script was one of many, except it had a very high accuracy. Over 90%.
Removing exif data is a great idea.
[1] github.com/telesoho/faceblur
Perhaps it’s better to remove the section of photo with a person’s face instead? Or draw a shape over their face and flatten the image? It seems to me as long as the pixels are there the identifying data is there for anyone willing to spend the time and effort to find it.
Edit: Apparently it was interpol, not the US government. I can't find the reddit thread but here's a NYT article with the photo: https://thelede.blogs.nytimes.com/2007/10/08/interpol-untwir...
That can't be done with a blur. In a blur, pixels are merged or averaged together and information is lost. In some cases you could sharpen it a little, but it's still not going to be as good as the original image. In a really good blur, even the best sharpen algorithm isn't going to give you something that looks like an identifiable face.
I'd be careful with that assumption. The only thing that really loses information is the discretization back into 0-255 range, and that naturally loses very little information.
If you consider the pixels as a large vector of values, you're effectively multiplying it by a matrix (plus discretization afterwards). If that matrix has (near) full rank, you can restore (close to) all the information.
Consider an grayscale image with two pixels a = 10, b = 20. I apply a blur that transfers 10% of each pixel to the other one. I end up with 11, 19. I'm left with the information 0.9 a + 0.1 b = 11, 0.1 a + 0.9 b = 19. Clearly this system can be solved uniquely. Or equivalently, the blur matrix (and I don't mean the kernel but the full blur operation matrix) is [[ 0.9 0.1 ] [ 0.1 0.9]], which has full rank and is thus invertible.
You'd be surprised at the amount of image detail that can be recovered by filtering when the original distortion function is known. See also https://en.wikipedia.org/wiki/Deconvolution and the lower half of that page's "See also" links section.
Also, how to handle the boundaries? We select a box in the image and blur that; we'd want to handle the boundaries in a way that also makes sure we lose information.