zlacker

Stop using zip codes for geospatial analysis (2019)

submitted by voxada+(OP) on 2025-02-07 16:46:47 | 184 points 129 comments
[view article] [source] [go to bottom]

NOTE: showing posts with links only show all posts
2. jihadj+t7[view] [source] 2025-02-07 17:29:50
>>voxada+(OP)
To put it in plain mathematical language, ZIP codes are not defined as polygons [0]. The consequence is that performing any analysis with an assumption that ZIP codes are polygons is bound to be error-prone.

0: https://manifold.net/doc/mfd8/zip_codes_are_not_areas.htm

3. mattfo+k8[view] [source] 2025-02-07 17:34:05
>>voxada+(OP)
Funny to see this one pop up today (I wrote this one way back when) but I just refreshed it into a video on my channel: https://www.youtube.com/watch?v=x-opv4REEic
5. ajfrie+Q8[view] [source] 2025-02-07 17:36:50
>>voxada+(OP)
...and use H3 instead! https://h3geo.org/
6. jpjoi+69[view] [source] 2025-02-07 17:38:30
>>voxada+(OP)
Zip codes are just weird to use for anything other than mail in general because they’re set up based off infrastructure.

CGP Grey has a great video on this: https://m.youtube.com/watch?v=1K5oDtVAYzk

10. funkas+Ua[view] [source] 2025-02-07 17:48:50
>>voxada+(OP)
If you want to learn a bit more, there was a recent, really good Planet Money episode[1] about this exact same topic. They focus on the problems that you might face when using zip code for demographic analysis.

[1]: https://www.npr.org/2025/01/08/1223466587/zip-code-history

11. throw0+ub[view] [source] 2025-02-07 17:52:20
>>voxada+(OP)
CGP Grey recently posted a video on Zip codes, "The Hidden Pattern in Post Codes":

* https://www.youtube.com/watch?v=1K5oDtVAYzk

20. serjes+xi[view] [source] 2025-02-07 18:30:26
>>voxada+(OP)
H3 is awesome here! What I don't think many people realize is that H3 cells and normal geographic data (like zips) are not mutually exclusive. You can take zip outlines, and find all the h3 cells within them and allocate your metric accordingly (population, income, etc).

This makes joining disparate data sources quite easy. And this also lets you do all sorts of cool stuff like aggregations, smoothing, flow modeling, etc.

We do some geospatial stuff and I wrote a polars plugin to help with this a while back [1].

[1] https://github.com/Filimoa/polars-h3

◧◩◪◨
24. ingeni+3m[view] [source] [discussion] 2025-02-07 18:51:49
>>ajfrie+si
Hey AJ, this is almost on topic, do you know of a more up to date version of the dataset you used on the blog post release for H3 v4.0.0 [1]? They stopped updating in Oct 2023. Thanks! [1] https://data.humdata.org/dataset/kontur-population-dataset
25. PLenz+zm[view] [source] 2025-02-07 18:55:31
>>voxada+(OP)
I gave a talk at DataEngConf many years ago: https://www.datacouncil.ai/talks/zip-codes-and-other-lies-yo...
◧◩
26. ericra+Um[view] [source] [discussion] 2025-02-07 18:57:04
>>jonas2+Xd
This is a tangent, but addresses are also way more complicated than most people realize - especially if you’re relying on a user to input a correct address or if you need to support multiple countries, somewhere with unique addresses like Queens[0], or you need to differentiate between units of a specific street address that uses something other than unit numbers for a unit designation.

At that point you need something like Smarty[1] to validate and parse addresses.

[0]: https://stackoverflow.com/questions/2783155/how-to-distingui...

[1]: https://www.smarty.com/

34. zuhaye+Yr[view] [source] 2025-02-07 19:25:33
>>voxada+(OP)
This is interesting since zip codes came up in consideration for how we built out our pay choropleth map in the US: https://levels.fyi/heatmap

Though ultimately it was far too granular (for example the Bay Area would be so many different zip codes). Instead we went with Nielsen's DMA (Designated Market Area) mappings within the US to abstract aggregated data a bit better. And of course this DMA dataset also had a different original use case. It was used for TV / media market surveys so it has some weird vestiges. Some regions are grouped very far and wide (you'll notice there's a bit of Denver within Nevada and its just a remnant of how it used to be categorized), but it still provides a bit of a broader level grouping than something acute like zip code.

I do like this map from the article though and the granularity you can get with zip code when zooming: https://clausa.app.carto.com/map/29fd0873-64cb-42a6-a90d-c83...

We've also been considering using Combined Statistical Areas using population instead. This is something that is under way, and in the interim we've considered charting styles that don't necessarily need borders (for example this bubble map: https://www.levels.fyi/bubble-plot/europe/). The benefit with DMAs is that it offers full border coverage of the entire US whereas some hubs can still be missing from CSAs if relying on a population threshold. But the plan is to create some of our own regional definitions and borders using our own submissions combined with population. Will be an interesting project.

GeoJSON data for the map borders: https://github.com/PublicaMundi/MappingAPI/blob/master/data/...

Nielsen DMA regions: https://blocks.roadtolarissa.com/simzou/6459889

◧◩
41. walrus+5t[view] [source] [discussion] 2025-02-07 19:32:31
>>jonas2+Xd
In terms of "good enough", a Canadian postal code, broadly equivalent to a zip code, is much more granular and can often identify an individual apartment building, or single city block. Plenty of large office buildings in major Canadian cities also have their own postal code.

The functionality of it is closer to the "Zip+4" with extension used to have a more granular routing of physical mail for USPS.

https://www.canadapost-postescanada.ca/cpc/en/support/articl...

https://en.wikipedia.org/wiki/Postal_codes_in_Canada

44. lacool+Ju[view] [source] 2025-02-07 19:41:29
>>voxada+(OP)
For anyone curious, here is the official US Gov list of ZIP codes in CSV with lots of helpful related data (longitude, latitude, etc.)

http://federalgovernmentzipcodes.us/free-zipcode-database-Pr...

45. ivell+Wu[view] [source] 2025-02-07 19:43:01
>>voxada+(OP)
India is experimenting with Digipin https://www.indiapost.gov.in/Navigation_Documents/Static_Nav...

Which is derived from longitude and latitude..

54. ubermo+ux[view] [source] 2025-02-07 20:00:14
>>voxada+(OP)
I'm reminded of this:

https://www.npr.org/2004/04/01/1805651/post-office-calls-for...

◧◩◪
58. kyleba+Zx[view] [source] [discussion] 2025-02-07 20:03:13
>>hammoc+cv
Equal distances to each adjacent neighbor: https://www.uber.com/blog/h3/
62. freyfo+AB[view] [source] 2025-02-07 20:24:25
>>voxada+(OP)
There are many problems with zip codes / postal codes but the biggest two we see are:

a. Excel treats them as numbers instead of strings of digits and thus drops the leading 0

b. Developers make assumptions about postal codes based on how they work (or more usually how the developer incorrectly thinks they work) in their own country and these assumptions absolutely do NOT hold in other countries.

A relevant guide to geocoding and postal codes: https://opencagedata.com/guides/how-to-think-about-postcodes...

◧◩
69. killjo+GC[view] [source] [discussion] 2025-02-07 20:30:07
>>jonas2+Xd
ZIPs are also specifically used in a variety of medical, epidemiologic, public health contexts and HHS has explicit, fairly fine-grained rules on their use: https://www.hhs.gov/hipaa/for-professionals/special-topics/d...
71. 0xbadc+RC[view] [source] 2025-02-07 20:30:55
>>voxada+(OP)
Here's a recent podcast about why ZIP codes are not great for analysis: https://www.npr.org/2025/01/08/1223466587/zip-code-history
74. Anon84+JD[view] [source] 2025-02-07 20:36:04
>>voxada+(OP)
This is an example of the well known Modifiable Areal Unit problem: https://en.wikipedia.org/wiki/Modifiable_areal_unit_problem In general, your statistics depend on how you define your areas and you will get different pictures with different definitions.
◧◩◪
81. throw0+YG[view] [source] [discussion] 2025-02-07 20:57:22
>>walrus+5t
> In terms of "good enough", a Canadian postal code, broadly equivalent to a zip code, is much more granular and can often identify an individual apartment building, or single city block.

To the point that StatCan and other agencies have rules on the number of characters that are collected/disseminated with other data to make sure it's not too identifying:

* https://www.canada.ca/en/government/system/digital-governmen...

* https://www12.statcan.gc.ca/nhs-enm/2011/ref/DQ-QD/guide_2-e...

◧◩◪◨⬒
99. freyfo+nV[view] [source] [discussion] 2025-02-07 22:34:29
>>alsodu+us
it is safe to assume nothing.

Please see: https://opencagedata.com/guides/how-to-think-about-postcodes...

I write this as someone who grew up in the ZIP code 09180

◧◩◪◨⬒
117. ericra+Hq1[view] [source] [discussion] 2025-02-08 03:54:33
>>VWWHFS+aw
This is sort of apocryphal - and also anecdotal because I have my own personal experience living in an annexed Boston neighborhood to draw on - but in a lot of the towns/neighborhoods that have been annexed by Boston, people still use the neighborhood name[1] as the city name because you are more likely to get your package when you indicate which “Washington St,” “Boylston St,” etc. you actually live at.

According to one commenter on the subject:

  It doesn't matter, as long as the zip code is correct

[0]: https://www.city-data.com/forum/boston/601106-mailing-addres...

[1]: https://www.city-data.com/forum/boston/601106-mailing-addres...

123. cwmoor+t42[view] [source] 2025-02-08 13:03:05
>>voxada+(OP)
Need a regex[2] using a trie to match valid state-zipcode pairs[1] for webpages likely to contain valid addresses?

[1] https://techbio.org/wiki/Addresses/finding-addresses-in-webp...

[2] https://techbio.org/wiki/Addresses/zipcode-trie-regex

◧◩◪◨
127. jwnacn+fW2[view] [source] [discussion] 2025-02-08 20:44:32
>>ghaff+QF
Here is a little-known (but very useful piece of information).

The US Postal Services has a team of people that handle address updates. This team is localized to different regions so that they generally are aware of local nuances. If you need to talk to the USPS about getting an address issue resolved simply go to this USPS AMS site and enter your zipcode to find the team that handles addresses in that area:

https://postalpro.usps.com/ppro-tools/address-management-sys...

If they don't answer, leave a message. They have helped me thousands of times in my last 14 years working with address validations.

[go to top]