On 10/21/2010 9:57 AM, Ali A.F.N wrote:
> Hi All,
> I have a table with different fileds and almost the type of all them are
> smallint. I want to search on some fields with "OR" condition. I want to know is
> there possibility to know how many fileds matched exactly? then I can say how
> many percent match are available.
> select * from my_table where sex_id = 1 or country_id = 120 or education_id
> I mean if in my table there are some records with sex_id = 1 or country_id =
> 120 then I got 2 (2 match) then I can say 66% percent match.
> Thank you,
This is where having a name-value pair in your MySQL can help. Your data
appears to belong to a dating site but it could easily belong to a
product catalog or many other types of data sets. In this example,I want
to compute product matches to see how close they are to my search criteria.
In rough symbolic terms, this is one layout that can help.
item qualities table
An index on (quality_name, quality_value, item_id) also comes in very
handy right about now.
Let's say you wanted to look up all of the items that have color=blue,
size=90cm, and flavor=peach (I really cannot imagine what this product
may be, but after all this is only an example)
So, with a big wide table, you would need to either do something like
SELECT ... FROM old_style WHERE color='blue' and size='90cm' and
But that would only find you an exact match. For partial matches, you
would need to construct all sorts of partial queries. like
... WHERE color='blue'
... WHERE color='blue' and size='90cm'
... WHERE color=size='90cm'
... WHERE color=size='90cm' and flavor='peach'
and compare the results.
Using the new tables, you construct 3 union queries in your code and
cache the results in a temporary table:
CREATE TEMPORARY TABLE tmp_relevance ENGINE=MEMORY
(SELECT item_id FROM item_qualities WHERE color='blue')
(SELECT item_id FROM item_qualities WHERE size='90cm')
(SELECT item_id FROM item_qualities WHERE flavor='peach');
Then you count up how often each item_id was matched:
SELECT item_id, count(*) as frequency FROM tmp_relevance GROUP BY item_id;
Could even modify that last step to check for a threshold of matching
(say only those that match at least half of the terms you are looking
for) with something that looks like this
SELECT item_id, count(*) as frequency FROM tmp_relevance GROUP BY
item_id HAVING frequency/(#of terms in the search) >=0.5 ;
Of course, you know the value of (# of terms in the search) because
that's how many union queries you needed to run.
You can improve on this technique in many ways. Here is one from the top
of my head:
Instead of returning only an item_id in the first query, you can also
return a quality rating. Let's say you were looking for something sized
90cm and you only have 88cm pieces in stock, that may return a match
quality code of
You can combine that in the query against tmp_relevance to generate
scores for near matches and not just exact partial matches.
Does this give you some ideas to build on?
MySQL Principal Technical Support Engineer
Oracle USA, Inc.
Office: Blountville, TN