>>>>> "Andrew" == Andrew Dunstan <andrewmd@stripped> writes:
Andrew> I have a table with about 700,000 rows as follows:
Andrew> k char(50) not null, v char(50) not null, primary key (k)
Andrew> I have a program that gets fed external data with about 2,000,000 k type
Andrew> values which must be filtered against this table (i.e. we must knock out
Andrew> from the 2m values those that appear in the table.
Andrew> I've been trying to benchmark this, and have the following data points:
Andrew> (this is all done in Perl/DBI using prepared statements for MySQL)
Andrew> Baseline: reading in external values and doing nothing: 654s
Andrew> Filtering against a BerkelyDb file (identical data): 1094s
Andrew> MySQL Filtering
Andrew> (select * from t where k = ?) 3497s
Andrew> (select * from t where k in (?,? ...[25 entries]) 1905s
Andrew> (select * from t where k = ? or k = ? or [25 entries] 1865s
Andrew> Of course, misses are expensive with the multivalued queries as k has to be
Andrew> compared against all the values provided.
MySQL makes a binary tree of the values in the 'in' part, so there
will not be a big penalty even if you add many values in the 'in'
part.
OR case will be searched sequentially until you get a miss.
Andrew> Does anyone have any suggestions as to how best to optimise this process?
Andrew> Ideally I'd like to get as close as possible to the BerkelyDb case, although
Andrew> I know that won't be entirely possible.
Have you tried to increase the key buffer and see if this helps?
The last option is to load all entries in a second table (with LOAD
DATA INFILE) and then do a join to find all matching rows...
Regards,
Monty