List:General Discussion« Previous MessageNext Message »
From:Bill Adams Date:October 23 2001 5:32pm
Subject:Re: Length limit of 500 on primary keys?
View as plain text  
Steve Meyers wrote:

> > > At a previous job, we tested a 32-bit hash function by running it
> > > against hundreds of thousands of unique URL's stored in our
> > > database.  We found one collision.  A 64-bit hash is billions of
> > > times better (4 billion, to be exact).
> >
> > Good to know.  I wonder how many collisions I'd find if I ran it over
> > every URL listed in the directory www.yahoo.com.
> >
> > Which 64 bit hash function did you use?  Invent your own, or something
> > "off the shelf"?
> >
>
> We found a public domain one on the net see
> http://www.burtleburtle.net/bob/hash/evahash.html for some sample code.  It's only a
> 32-bit hash though.  However, that same page appears to have instructions for a 64-bit
> hash function as well, but I haven't tried it at all.  I'd be curious to know how many
> collisions you find hashing all the URL's in yahoo's database :)  I don't know how long
> that would take, but if you do it I'd like to hear the results.
>
> Since the hash function takes a key and an initial value, you could try running it
> with two different initial values and/or keys.  This would give you effectively a 128-bit
> hash, which you could store across two fields in MySQL.  I'm guessing that the 64-bit hash
> will probably be good enough though.

I am not understanding why having a hash and the full url in the database would not take
care of the collisions.  Even if you had 10 collisions for a 16 bit hash (say), if your
query was:
SELECT ... WHERE hash=thehashvalue AND url='theurl' you would get very fast lookups on the
hash and the url comparison would not add much to the query at that point.  You could even
do a partial index on the url, e.g.  "KEY( hash, url(200))".

b.


Thread
Length limit of 500 on primary keys?Shankar Unni21 Oct
  • RE: Length limit of 500 on primary keys?Chris Bolt21 Oct
    • RE: Length limit of 500 on primary keys?Steve Meyers21 Oct
      • Re: Length limit of 500 on primary keys?Shankar Unni22 Oct
        • RE: Length limit of 500 on primary keys?Steve Meyers22 Oct
          • Re: Length limit of 500 on primary keys?Jeremy Zawodny23 Oct
            • RE: Length limit of 500 on primary keys?Steve Meyers23 Oct
              • RE: Length limit of 500 on primary keys?Steve Meyers23 Oct
                • Re: Length limit of 500 on primary keys?Shankar Unni24 Oct
              • Re: Length limit of 500 on primary keys?Bill Adams23 Oct
                • RE: Length limit of 500 on primary keys?Steve Meyers23 Oct
              • RE: Length limit of 500 on primary keys?Don Read23 Oct
            • Re: Length limit of 500 on primary keys?Dan Nelson24 Oct
Re: Length limit of 500 on primary keys?Heikki Tuuri23 Oct