0: https://manifold.net/doc/mfd8/zip_codes_are_not_areas.htm
They don't represent geography at all, they represent the organizational structure of USPS.
They work by making the address on a letter almost meaningless. For some smaller population zip codes you can practically just put the name and zip code down and achieve delivery.
Yeah, but any analysis you're likely to perform is approximate enough that the fact that ZIP codes aren't polygons is basically a rounding error.
Plus, it's a lot easier to get ZIP codes, and they're more reliably correct, so you might still get better results, than you would going with another indicator that is either (a) less reliable or (b) less available.
You're not going to wind up with a situation where zip codes with the same regional marker end up on different coasts.
In other words, is it safe to assume that for entity in a zip code is less than x distance away from the closest entity in the same zip code?
A 5+4 formatted ZIP code maps to just a handful of addresses. In cities with larger populations, the +4 could map to a single building, and in more sparely populated place, it might include houses on a handful of roads.
For smaller datasets, ZIP+4 might as well be a unique household identifier. I just checked a 10 million address database and 60% of entries had a unique ZIP+4, so one other bit of PII would be enough to be a 99.99% unique identifier per person.
With a geo-coded ZIP+4 database, you could locate people with a precision that's proportional to the population density of their region.
Please see: https://opencagedata.com/guides/how-to-think-about-postcodes...
I write this as someone who grew up in the ZIP code 09180
Couldn't this happen for military or proxy codes (PO boxes or other) ?
zip codes don't even need to be contiguous. It's a mail delivery route, not a polygon.
There are 5 cases where the assumption is violated:
- Non-contiguous areas
- Zip codes that are a single point (some big companies get their own zip with a single mailbox, e.g. GE in Schenectady, NY is zip 12345)
- Zip codes that are a single line (highway-based delivery routes)
- Overlapping boundaries (since mail routes are linear, choosing a polygon representation is arbitrary and often not unique in space)
- Residents of some zip codes are not stationary (e.g. houseboats)
In short, asking questions about the area of a zip code is a category error - zip codes do not have a uniform representation in space. And we should be highly skeptical of any geospatial analysis that assumes polygons.
The big problem is zip codes are defined in terms of convenient postal routes and aren't suitable for most geospatial analysis. Census units, as the article explains, are a much better choice.
What they do not have is any sort of spatial consistency, they are a convenience for mail sorting. So if you start analyzing patterns across zip codes, you are pulling in information that is likely useless for or harmful to answering your question.