List:General Discussion« Previous MessageNext Message »
From:terry jones Date:June 29 1999 2:21am
Subject:optimising large query
View as plain text  
Hi Andrew.

why don't you read the 2M values from the DB into memory and hash them
all up?

i know, you'd need 33MB to do this the naive way, but maybe you have
that much available?

with this scheme, speed will go WAY down (and, yes, memory usage goes
WAY up) since you can get expected constant time lookup instead of log
time (or worse). as usual, there's a space/time tradeoff and you get
to choose.


if you don't have that much memory, i'd suggest either building a
different data structure that will still allow a fast lookup but which
uses less room. for example, you could use some form of compression and
decompress on the fly when needed (decompression can be very fast), or
you might be able to sort incoming queries and just decompress what's
needed for the current batch, or use a trie (which may take more room,
but maybe not).

another suggestion is to build a bitmap based on a hash function. you
could do that and with less than 0.5 MB of RAM you could VERY quickly
know that a word was NOT in the database (or vice-versa) and do a
slower query to confirm that a word was in it. so you'd have at least
half your queries going by in constant (expected) time, maybe much
better (depending on how often the incoming words are in the DB).


in any case, if you want speed, your current approach is bound to be
slow (compared to a hashing scheme such as the above).

i have code that you can use to do the hashing (or the bitmapping) and
lookup if you'd like it. you'd simply have to modify it to read the
data into the hash table. let me know,

regards,
terry.
Thread
optimising large queryAndrew Dunstan23 Jun
  • optimising large queryMichael Widenius29 Jun
  • optimising large queryterry jones29 Jun
  • optimising large queryterry jones29 Jun