List:General Discussion« Previous MessageNext Message »
From:Aamir Baig Date:May 27 2003 3:51am
Subject:Query Challenge - Another Issue
View as plain text  
On the previous query optimization problem that I posted (for which no 
response yet), I've ran into another twist.
The key/value pairs can be duplicated . I-e an Entity can have multiple 
values for a key. This makes the count() > = no of conditions check 
invalid. Now, I'm really breaking my brain trying to figure out what's 
the accurate/fastest way to do this.

Here's the problem again:
Hello All,
I am using mysql 4.0.12 and am trying to build a parametric search 
engine.  The problem I believe can be essentially generalized as 

Have a table named Entity (PK EntityID) and another table called 
EntityProperties (PK EPID, Key varchar, Value varchar). EP stores 
key/value proeprties of E.

What I would like to be able to do is to return a list of Entities that 
match a given set of key/value pairs. The query in itself is simple:

Lets say I have 4 key value pairs  (key1, value1) , (key2, value2), 
(key3, value3), (key4, value4) entered for search. The resulting search 
query would look like:

Select E.*
Entity E, EntityProperties EP
(EP.KEY = key1 AND EP.VALUE = value1) OR
(EP.KEY = key2 AND EP.VALUE = value2) OR
(EP.KEY = key3 AND EP.VALUE = value3) OR
(EP.KEY = key4 AND EP.VALUE = value4) OR
Having count(E.EID) >= 4;

The count is 4, as there are 4 matching conditions required.

The problem comes in when I have like 200,000 rows in E and 2 million 
rows in EP, the query above just crawls to like 20 secs average.

So I create a 2 column index on EP (Key, value), (To do that I had to 
change the table type to MyISAM from InnoDB as Value is of type Text 
which does not support index creation on InnoDB), and the query does 
become faster, comes down to around 7 secs avg.

But a) This is still not fast enough, I would like it to be under a 
second, and b) I expect my EP to grow upto around 20-30 millions rows, 
so it certainly won't scale too well.

I did think about creating key-value pair keywords, creating a column 
in E to store them and then create a FULL-TEXT index and use a IN 
BOOLEAN MODE search, however, In the key/value match conditions I use 
above, I might also need comparison operators..i-e something like:
EP.KEY = key1 and EP.VALUE > value1
and that the current mysql full-text implementation does not allow.

I was also wondering if hash indexes can be used somehow, But I have no 
idea a) how they work in mysql and b) whether they would be useful.

I would appreciate any advice/guidance in optimizing this query or a 
totally different solution that accomplishes the same result and 
showing me how to do this right.

Thank you,
Aamir Baig

Query Challenge - Another IssueAamir Baig27 May