- Well-known (everybody knows their zip code)
- Easily extracted (they're part of every address, no geocoding required)
- Uniform-enough (not perfect, but in most cases close)
- Granular-enough
- Contiguous-enough by travel time
Notably, the alternatives the author proposes all fail on one or more of these:
- Census units: almost nobody knows what census tract they live in, and it can be non-trivial to map from address to tract
- Spatial cells: uneven distribution of population, and arbitrary division of space (boundaries pass right through buildings), and definitely nobody knows what S2 or H3 cell they live in.
- Address: this option doesn't even make sense. Yes, you can geocode addresses, but you still need to aggregate by something.
Fact is a lot of web data contains a zip but if you can collect something better it will usually render better results. Unless you are analyzing shipments then that is fine.
Another consideration is what kind of reference information is available at different spatial units. There are plenty of Census Bureau data available by ZCTA but some data may only be available at other aggregate units. Zip Codes are often used as political boundaries.
I'd also mention the "best" areal unit depends on the data. There is a well known phenomenon called the modifiable areal unit problem in which spatial effects appear and vanish at different spatial resolutions. It can sort of be thought of as a spatial variation of the ecological fallacy.
At that point you need something like Smarty[1] to validate and parse addresses.
[0]: https://stackoverflow.com/questions/2783155/how-to-distingui...
The real problem is ever using an average without also specifying some sort of bounds. For median-based data, this probably means the upper and lower quartiles (or possibly other percentiles); for mean-based data, this probably means standard deviation.
Similar issues for city name, of course.
The functionality of it is closer to the "Zip+4" with extension used to have a more granular routing of physical mail for USPS.
https://www.canadapost-postescanada.ca/cpc/en/support/articl...
But broadly speaking, nobody knows what their ZIP+4 is, while I imagine that most people in Canada know their postal code by heart.
It is interesting.
And don’t forget sales tax. Which is state + county + city
Most sites/apps will let me override the validator, but a few won't. The most common ones that insist on using the wrong address are financial institutions that say the law requires them to have my proper physical address and therefore they go with the (incorrectly) validated version.
USPS does not do home delivery in our area, and UPS/FedEx/etc. usually figure it out given that street numbers alone uniquely identify properties in our town.
Zip and distance as the crow flies often gives shit data. My zip suggests I'm off in bum fuck and since I'm on the puget sound things that are relatively near as the crow flies can actually be hours away.
To the point that StatCan and other agencies have rules on the number of characters that are collected/disseminated with other data to make sure it's not too identifying:
* https://www.canada.ca/en/government/system/digital-governmen...
* https://www12.statcan.gc.ca/nhs-enm/2011/ref/DQ-QD/guide_2-e...
If you get close enough, it usually gets handled in the local sort, but not always.
On cities, the mailing address city really is the name of the post office that handles your delivery route. Often there's a relationship with the city you live in, but there's cases both ways --- I used to live outside city limits, we had a census designated place name, a municipal sanitary district and had a fire department at one time... but never a post office, so our mailing address used the nearby city name, where our post office resided. The place name had an incorporated city on the other side of the state, so using that wouldn't be great.
Nowadays, post offices often have a list of alternative place names, so where I live now, I can pick between the incorporated city name, the nearby large city where a post office that processes all my mail is located, or any of the numerous small post offices that once served my city.
Every customer I've worked with insisted on having all addresses ran through the USPS verification API so they could get their bulk mailing discounts.
Even if you get the delivery/cost side under control, you still have to make sure you are talking about the right address from a logical perspective. Mailing, physical, seasonal, etc. address types add a whole extra dimension of fun.
Regarding article, it really depends on the use case of whether to use ZIP Code (TM), postal code, Canada Post Forward Sortation Area, lat/lon, Census Bureau block and tract, etc.
As has been noted, the ZIP Code is often good enough for aggregating data together and can be a good first step if you don’t know where to start.
We have non-postal addresses and a lot of other mechanisms to help here. We also have contacts at the USPS and others to help fix addresses.
I’d love to have you email your mailing address to support@smarty.com with a link to this HN thread. We may be able to help fix some of this.
Bigger cities can have multiple post offices and zip codes with the same mail address city.
According to one commenter on the subject:
It doesn't matter, as long as the zip code is correct
[0]: https://www.city-data.com/forum/boston/601106-mailing-addres...[1]: https://www.city-data.com/forum/boston/601106-mailing-addres...
That's only true if you can also access the spatial boundaries of the zipcodes themselves.
In Australia, this turns out not to be true: the postal system considers their boundaries to be commercial confidential information and doesn't share them. The best we can do is the Australian Bureau of Statistics' approximations of them, which they dub "postal areas".
The US Postal Services has a team of people that handle address updates. This team is localized to different regions so that they generally are aware of local nuances. If you need to talk to the USPS about getting an address issue resolved simply go to this USPS AMS site and enter your zipcode to find the team that handles addresses in that area:
https://postalpro.usps.com/ppro-tools/address-management-sys...
If they don't answer, leave a message. They have helped me thousands of times in my last 14 years working with address validations.
And in this case the fire companies had no problem finding my house in spite of the incorrect information in town records. As you suggest the field people on the ground generally know what the ground truth is.