List:General Discussion« Previous MessageNext Message »
From:paris lundis Date:April 19 2005 4:13am
Subject:Re: zip code search within x miles
View as plain text  
I'll share my creative workaround for geographic searches within x miles.

This is a theory I developed years ago and have used in live production
systems for the last five years. This method to be described is used
to produce approximately 100-200k sets of live nearby data per day online.

My approach is necessary for ColdFusion application server display of
relative nearby geographic data on my very busy website: 
http://www.pubcrawler.com
I use MySQL for the database storage with some MS SQL.

First, there are several different formulas readily available as math by 
which to
re-engineer for your particular use. The formulas should be able to be 
ported to any
platform/language given the math functions needed are available.

My concept  is simple and I believe  users are willing to accept some 
small margin of
error. This meaning, a place say a mile or two over further in one 
direction.

Zipcodes are notoriously odd in their adoption in areas. They follow a 
North to South and East to West incrementation.
Certain inferences can simply be made about a high zipcode or a low one 
belonging to either coast.

Nearby zipcodes are minimally useful, other than to say here are other 
zipcodes (say for instance of a real estate
application). The wild goose chase.

By all means, you need to reduce the amount of math that goes into 
preparing a query and say a page in your
application.

What I do, to avoid pre-compilation - that is making all the possible or 
common possible results already plotted
and thereby pre-planned - is work on establishing the minimums and 
maximums at four values - these are a min and max for both lat and lon:

 NW        |         NE
                |
----------------------
                |
SW          |        SE


I establish these as the max and mins based on converting the distance 
(fraction of mile to many miles) to degrees and doing the math to 
calculate each of the four points.

With these four points I can make a very simple and fast comparison 
where latitude =< maxlat and latitude => minlat and longitude =< maxlon 
and longitude >= minlon

This approach works on a square and gives you an error margin which is 
the difference of placing the distance exhibited as a circle within the 
same space exhibited as a square.
The error or overlap areas are minimal and for terms of consumer based 
services will suffice. These areas are triangles with the inward side 
being an arc.

I use this approach to relate okay sized sets of data to one another on 
the fly (500k restaurants, 6000 beer establishments, 13k bed and 
breakfasts, 50k hotels, etc.)

All the math is and can be done on the fly this way on commodity 
hardware (800Mhz server) on the application server level.  Bringing this 
to a pure
SQL level would increase the speed further.

A correlation for nearby hotels with this approach will run in a 
multi-use environment on average in about four seconds. With further 
optimization this can be
reduce to around 1 second..

This approach is a compromise to more exact math offered elsewhere on 
the internet.

Applying this same thing to apply to the 80k estimated US zipcodes 
currently would be pie and would run faster just due to the reduced dataset.

Zipcode databases are readily available with quarterly subscription 
updates available today. I believe the US Postal Service even sells a 
dataset now.


Richard Lynch wrote:

>You can buy up-to-the-minute zips, or snag TIGER (or gazateer?) data
>that's a bit old for free.
>
>It's a 1-1 mapping of zip to long/lat.
>
>The tricky bit is this.
>
>There's about 65,000 zips, even in the out-dated list for free.
>
>Let's say you've got, oh, 2000 records to search through.
>
>You're gonna end up doing a JOIN with:
>2,000 X 65,000 == 130,000,000 tuples (records/results).
>
>130 MILLION tuples is *way* too many for your basic $20/month site.
>
>So what you do is break the rules.
>
>That's right, it's rule-breaking time.
>
>First, add a longitude and latitude column to your 2000 record table,
>default to NULL.
>
>Next, add a trigger (in MySQL 4.1+ I think) so that any time you change
>the country or zip, the latitude/longitude gets looked up in the zips
>table, and filled in.
>
>If you can't do triggers (in MySQL pre-4.1) you'll have to arrange your
>Busicness Logic such that any time you UPDATE country or zip in your 2000
>records table, you reset longitude/latitude to NULL.
>
>Next, write a cron job to snag, say, 10 records from your table "WHERE
>latitude is NULL" in random order.
>
>Lookup the country and zip in your zips table, and get your $latitude and
>$longitude for that country/zip then do:
>
>update EXAMPLE set latitude = $latitude, longitude = $longitude where
>country = '$country' and zip = '$zip'
>
>Errr, I kinda slipped into PHP there, but you probably know what I mean. 
>$x is a variable holding the values you looked up.
>
>Run that cron job a whole lot at first, by hand, to get mostly all the
>2000 records fixed up with the right long/lat.
>
>The point of this is that *NOW* you only have to search through 2000
>records for your distance function, which sure beats a join with 130
>million records, eh?
>
>Also, if you just want the NEAREST matches, forget all that complicated
>crap about Great Circle distance and expensive trigonometric calculations.
>
>You can just get the zip they want to search by, lookup $longitude and
>$latitude for that zip, and then use:
>
>select {application_data_here), abs(latitude - $latitude) + abs(longitude
>- $longitude) as closest from ... order by closest
>
>If MySQL won't let you use 'closest' in the where clause, just move the
>calculations there.
>
>I've done this before for clients who didn't want to muck with all this. 
>It's a few hours' work.  Call it $200.
>
>PS I'm working on a GLOBAL POSTAL system which will data-gather OSS
>long/lat and country/zip for the entire world, rather than the current
>mess of having only the US for free, and maybe Canada and Australia, but
>not much else is free, and often not even available.
>
>I guess I'd better post here when I launch that.  No time-line available yet.
>
>On Fri, April 15, 2005 3:37 pm, Scott Haneda said:
>  
>
>>How are sites doing the search by zip and coming up with results within x
>>miles?  Is there some OSS zip code download that has been created for
>>this?
>>--
>>-------------------------------------------------------------
>>Scott Haneda                                Tel: 415.898.2602
>><http://www.newgeo.com>                     Novato, CA U.S.A.
>>
>>
>>
>>--
>>MySQL General Mailing List
>>For list archives: http://lists.mysql.com/mysql
>>To unsubscribe:    http://lists.mysql.com/mysql?unsub=1
>>
>>
>>    
>>
>
>
>  
>

Thread
zip code search within x milesScott Haneda16 Apr
  • RE: zip code search within x milesgunmuse16 Apr
  • Re: zip code search within x milesRobert Dunlop16 Apr
    • RE: zip code search within x milesScott Johnson19 Apr
  • Re: zip code search within x milesGreg Donald16 Apr
  • Re: zip code search within x milesScott Gifford16 Apr
    • RE: zip code search within x milesgunmuse16 Apr
      • Re: zip code search within x milesJeff Kolber16 Apr
        • Re: zip code search within x milesScott Gifford17 Apr
          • Re: zip code search within x milesJeff Kolber20 Apr
  • Re: zip code search within x milesRichard Lynch19 Apr
    • Re: zip code search within x milesparis lundis19 Apr
      • Re: zip code search within x milesHank19 Apr
    • Re: zip code search within x milesHank19 Apr
      • Re: zip code search within x milesRichard Lynch25 Apr
  • Re: zip code search within x milesEamon Daly19 Apr
    • RE: zip code search within x milesScott Johnson19 Apr
  • Re: zip code search within x milesEamon Daly19 Apr
    • Re: zip code search within x milesHank19 Apr
      • Re: zip code search within x milesKeith Ivey19 Apr
        • Re: zip code search within x milesHank19 Apr
          • Re: zip code search within x milesSGreen19 Apr
            • Re: zip code search within x milesHank19 Apr
              • Re: zip code search within x milesRichard Lynch25 Apr
            • Re: zip code search within x milesJeremy Cole29 Apr
          • Problem detected this morning.Fredrick Bartlett5 Aug
            • Re: Problem detected this morning.Gleb Paharenko7 Aug
        • Re: zip code search within x milesScott Haneda19 Apr
      • Re: zip code search within x milesRichard Lynch25 Apr
        • RE: zip code search within x milesgunmuse25 Apr