List:General Discussion« Previous MessageNext Message »
From:Michael Widenius Date:June 29 1999 1:18am
Subject:optimising large query
View as plain text  
>>>>> "Andrew" == Andrew Dunstan <andrewmd@stripped> writes:

Andrew> I have a table with about 700,000 rows as follows:

Andrew> k char(50) not null, v char(50) not null, primary key (k)

Andrew> I have a program that gets fed external data with about 2,000,000 k type 
Andrew> values which must be filtered against this table (i.e. we must knock out 
Andrew> from the 2m values those that appear in the table.

Andrew> I've been trying to benchmark this, and have the following data points:

Andrew> (this is all done in Perl/DBI using prepared statements for MySQL)
Andrew> Baseline: reading in external values and doing nothing:   654s
Andrew> Filtering against a BerkelyDb file (identical data):     1094s
Andrew> MySQL Filtering
Andrew>    (select * from t where k = ?)                         3497s
Andrew>    (select * from t where k in (?,? ...[25 entries])     1905s
Andrew>    (select * from t where k = ? or k = ? or [25 entries] 1865s

Andrew> Of course, misses are expensive with the multivalued queries as k has to be 
Andrew> compared against all the values provided.

MySQL makes a binary tree of the values in the 'in' part, so there
will not be a big penalty even if you add many values in the 'in'

OR case will be searched sequentially until you get a miss.

Andrew> Does anyone have any suggestions as to how best to optimise this process? 
Andrew> Ideally I'd like to get as close as possible to the BerkelyDb case, although 
Andrew> I know that won't be entirely possible.

Have you tried to increase the key buffer and see if this helps?

The last option is to load all entries in a second table (with LOAD
DATA INFILE) and then do a join to find all matching rows...

optimising large queryAndrew Dunstan23 Jun
  • optimising large queryMichael Widenius29 Jun
  • optimising large queryterry jones29 Jun
  • optimising large queryterry jones29 Jun