Optimizing distance calculations

My brother wrote about calculating distances with PostGIS. I followed up with some suggestions, but WordPress seems to be eating my comments (strip_tags() is mutilating them, stripping anything after a “<”). So I’m following up here.

Most of my first comment got eaten; I went on to optimize the query significantly. It should have a WHERE s.zip = 10001, which makes it O(n+1). Obviously, there’s no point in an unconstrained O(nn) JOIN, since we only care about the distance of one zip to the others.

What I described was an optimization where you use the JOIN to narrow the resultset to an approximate area, then refine it with distance_sphere(). So you tighten the focus of the expensive operation to the results which are most likely to be relevant.

In other words, if you’re looking for something in Delaware, don’t bother with Nevada.

The magic here is that the average distance of one degree of lat/lon is 69.2 miles. So you can:

SELECT d.* FROM `zip` s JOIN `zip` d
 WHERE s.`zip` = 10001
   AND (ABS(s.`lat` - d.`lat`) < = (15 / 34.6))
   AND (ABS(s.`lon` - d.`lon`) < = (15 / 34.6))
   AND distance_sphere(s.`geom`, d.`geom`) < = (15 * 1609.344);

So the first WHERE clause narrows it to one row in the left zip table, then the next two narrow the results in the right table to anything within a 15×15 mile square around the source, then distance_sphere() operates on this much smaller dataset. You’d likely need to EXPLAIN that to get the correct order – some conditions may need to go into ON rather than WHERE.

He’s right about the index, though, it wouldn’t help.

2008/11/26

Discussion

I see exactly what you’re doing… I’ve been down this path before, and it works a lot better in theory than in practice. The distance_sphere function is fairly optimized, so adding stuff on top of it usually makes the query slower, rather than faster.

In certain cases, bounding-box lookups can be indexed. I just haven’t had the time to look into it.

Cameron Eure
2008/11/26

And to clarify, I’m talking about doing something like:

CREATE INDEX zip_geom_idx ON zip USING GIST(geom);

SELECT * FROM zip WHERE geom && ‘BOX()’::box2d AND distance_sphere(geom, ) <= (15 * 1609.344);

Cameron Eure
2008/11/26

Participate