zlacker

[parent] [thread] 18 comments
1. jihadj+(OP)[view] [source] 2025-02-07 17:29:50
To put it in plain mathematical language, ZIP codes are not defined as polygons [0]. The consequence is that performing any analysis with an assumption that ZIP codes are polygons is bound to be error-prone.

0: https://manifold.net/doc/mfd8/zip_codes_are_not_areas.htm

replies(3): >>mholt+Y7 >>mcphag+5g >>Nelson+8X
2. mholt+Y7[view] [source] 2025-02-07 18:15:13
>>jihadj+(OP)
Yeah. ZIP codes are sets in the abstract-dimensional space of carrier delivery points. I suppose you could think of them as lines, but definitely not polygons.
replies(1): >>cogman+Ja
◧◩
3. cogman+Ja[view] [source] [discussion] 2025-02-07 18:28:37
>>mholt+Y7
Zip codes (in the US) are machine readable numbers a mail sorter can use to send a parcel to the right delivery truck for final delivery. In the US, they represent the hierarchy of postal centers with the most significant digit representing the primary hub for a region and the smallest number the actual post office that will be in charge of delivering the letter (or truck if you do the extended post code).

They don't represent geography at all, they represent the organizational structure of USPS.

They work by making the address on a letter almost meaningless. For some smaller population zip codes you can practically just put the name and zip code down and achieve delivery.

replies(4): >>Spivak+uk >>alsodu+1l >>mywitt+Ln >>mattfo+Du
4. mcphag+5g[view] [source] 2025-02-07 19:00:51
>>jihadj+(OP)
> The consequence is that performing any analysis with an assumption that ZIP codes are polygons is bound to be error-prone.

Yeah, but any analysis you're likely to perform is approximate enough that the fact that ZIP codes aren't polygons is basically a rounding error.

Plus, it's a lot easier to get ZIP codes, and they're more reliably correct, so you might still get better results, than you would going with another indicator that is either (a) less reliable or (b) less available.

replies(1): >>mattfo+rv
◧◩◪
5. Spivak+uk[view] [source] [discussion] 2025-02-07 19:25:29
>>cogman+Ja
Right but this ends up being a good approximation for geography because the reality of logistics is that you end up doing a cute n-ary search of the geography. When you know the regional hub you can say for certain a huge chunk of the US the zip code doesn't represent. And then you keep n-secting. Sometimes the land-mass you get at the end is specific enough for your uses.

You're not going to wind up with a situation where zip codes with the same regional marker end up on different coasts.

replies(2): >>mattfo+av >>makeit+jP
◧◩◪
6. alsodu+1l[view] [source] [discussion] 2025-02-07 19:28:11
>>cogman+Ja
I agree that they weren't explicitly meant to represent geography, but implicitly they do, right? Are there cases where this is violated?

In other words, is it safe to assume that for entity in a zip code is less than x distance away from the closest entity in the same zip code?

replies(4): >>freyfo+UN >>makeit+HO >>perryg+wT >>maxeri+zX1
◧◩◪
7. mywitt+Ln[view] [source] [discussion] 2025-02-07 19:44:53
>>cogman+Ja
> For some smaller population zip codes you can practically just put the name and zip code down and achieve delivery.

A 5+4 formatted ZIP code maps to just a handful of addresses. In cities with larger populations, the +4 could map to a single building, and in more sparely populated place, it might include houses on a handful of roads.

For smaller datasets, ZIP+4 might as well be a unique household identifier. I just checked a 10 million address database and 60% of entries had a unique ZIP+4, so one other bit of PII would be enough to be a 99.99% unique identifier per person.

With a geo-coded ZIP+4 database, you could locate people with a precision that's proportional to the population density of their region.

replies(1): >>mattfo+Qu
◧◩◪
8. mattfo+Du[view] [source] [discussion] 2025-02-07 20:27:41
>>cogman+Ja
Well put
◧◩◪◨
9. mattfo+Qu[view] [source] [discussion] 2025-02-07 20:28:50
>>mywitt+Ln
Yeah but we have that already in the census hierarchy. Plus you have to pay to access Zip+4 geospatial data and it changes sometime as frequently as quarterly
◧◩◪◨
10. mattfo+av[view] [source] [discussion] 2025-02-07 20:30:03
>>Spivak+uk
Just use a spatial query. That’s what they are made for.
◧◩
11. mattfo+rv[view] [source] [discussion] 2025-02-07 20:31:05
>>mcphag+5g
They aren’t reliable correct actually. The boundaries that the Census publishes are called Zip Code Tabulation Areas which are approximations of zip codes and include overlaps.
replies(1): >>wombat+bQ
◧◩◪◨
12. freyfo+UN[view] [source] [discussion] 2025-02-07 22:34:29
>>alsodu+1l
it is safe to assume nothing.

Please see: https://opencagedata.com/guides/how-to-think-about-postcodes...

I write this as someone who grew up in the ZIP code 09180

◧◩◪◨
13. makeit+HO[view] [source] [discussion] 2025-02-07 22:40:16
>>alsodu+1l
It might be true, but does it help if the x varies from "on a nearby mountain" to "within a street block", and you sometimes have every habitants closer to another zip code than theirs ?
◧◩◪◨
14. makeit+jP[view] [source] [discussion] 2025-02-07 22:44:15
>>Spivak+uk
> You're not going to wind up with a situation where zip codes with the same regional marker end up on different coasts.

Couldn't this happen for military or proxy codes (PO boxes or other) ?

◧◩◪
15. wombat+bQ[view] [source] [discussion] 2025-02-07 22:51:15
>>mattfo+rv
ZCTA5 roughly corresponds to the area of a 5 digit zip code. Problem is there are large areas of the west that don’t have permanent residents and no mail delivery. Plus they change over time.
◧◩◪◨
16. perryg+wT[view] [source] [discussion] 2025-02-07 23:19:51
>>alsodu+1l
> less than x distance away

zip codes don't even need to be contiguous. It's a mail delivery route, not a polygon.

There are 5 cases where the assumption is violated:

- Non-contiguous areas

- Zip codes that are a single point (some big companies get their own zip with a single mailbox, e.g. GE in Schenectady, NY is zip 12345)

- Zip codes that are a single line (highway-based delivery routes)

- Overlapping boundaries (since mail routes are linear, choosing a polygon representation is arbitrary and often not unique in space)

- Residents of some zip codes are not stationary (e.g. houseboats)

In short, asking questions about the area of a zip code is a category error - zip codes do not have a uniform representation in space. And we should be highly skeptical of any geospatial analysis that assumes polygons.

17. Nelson+8X[view] [source] 2025-02-07 23:50:28
>>jihadj+(OP)
That's not the important problem and there's a simple solution with ZCTAs.

The big problem is zip codes are defined in terms of convenient postal routes and aren't suitable for most geospatial analysis. Census units, as the article explains, are a much better choice.

replies(1): >>BiteCo+HR1
◧◩
18. BiteCo+HR1[view] [source] [discussion] 2025-02-08 12:02:51
>>Nelson+8X
You ask a customer their census unit on purchase though.
◧◩◪◨
19. maxeri+zX1[view] [source] [discussion] 2025-02-08 13:10:16
>>alsodu+1l
They do provide a location with whatever error bars on it.

What they do not have is any sort of spatial consistency, they are a convenience for mail sorting. So if you start analyzing patterns across zip codes, you are pulling in information that is likely useless for or harmful to answering your question.

[go to top]