List:General Discussion« Previous MessageNext Message »
From:Shawn Green (MySQL) Date:October 22 2010 11:10am
Subject:Re: Percent of match in condition
View as plain text  
On 10/21/2010 9:57 AM, Ali A.F.N wrote:
> Hi All,
>
> I have a table with different fileds and almost the type of all them are
> smallint. I want to search on some fields with "OR" condition. I want to know is
> there possibility to know how many fileds matched exactly? then I can say how
> many percent match are available.
>
> select * from my_table where sex_id = 1 or country_id = 120 or education_id
>
> I mean if in my table there are some records with sex_id = 1 or country_id =
> 120  then I got 2 (2 match) then I can say 66% percent match.
>
> Thank you,

This is where having a name-value pair in your MySQL can help. Your data 
appears to belong to a dating site but it could easily belong to a 
product catalog or many other types of data sets. In this example,I want 
to compute product matches to see how close they are to my search criteria.

In rough symbolic terms, this is one layout that can help.

item table
=================
id
product_name
(other fields)

item qualities table
=================
quality_id
item_id
quality_name
quality_value

An index on (quality_name, quality_value, item_id) also comes in very 
handy right about now.

Let's say you wanted to look up all of the items that have color=blue, 
size=90cm, and flavor=peach (I really cannot imagine what this product 
may be, but after all this is only an example)

So, with a big wide table, you would need to either do something like

SELECT ... FROM old_style WHERE color='blue' and size='90cm' and 
flavor='peach';

But that would only find you an exact match.  For partial matches, you 
would need to construct all sorts of partial queries. like

... WHERE color='blue'
... WHERE color='blue' and size='90cm'
... WHERE color=size='90cm'
... WHERE color=size='90cm' and flavor='peach'
...

and compare the results.

Using the new tables, you construct 3 union queries in your code and 
cache the results in a temporary table:

CREATE TEMPORARY TABLE tmp_relevance ENGINE=MEMORY
(SELECT item_id FROM item_qualities WHERE color='blue')
UNION ALL
(SELECT item_id FROM item_qualities WHERE size='90cm')
UNION ALL
(SELECT item_id FROM item_qualities WHERE flavor='peach');

Then you count up how often each item_id was matched:

SELECT item_id, count(*) as frequency FROM tmp_relevance GROUP BY item_id;

Could even modify that last step to check for a threshold of matching 
(say only those that match at least half of the terms you are looking 
for) with something that looks like this

SELECT item_id, count(*) as frequency FROM tmp_relevance GROUP BY 
item_id HAVING frequency/(#of terms in the search) >=0.5 ;

Of course, you know the value of (# of terms in the search) because 
that's how many union queries you needed to run.

You can improve on this technique in many ways. Here is one from the top 
of my head:

Instead of returning only an item_id in the first query, you can also 
return a quality rating. Let's say you were looking for something sized 
90cm and you only have 88cm pieces in stock, that may return a match 
quality code of

1-(abs(90-88)/90)

You can combine that in the query against tmp_relevance to generate 
scores for near matches and not just exact partial matches.

Does this give you some ideas to build on?
-- 
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc.
Office: Blountville, TN
Thread
Percent of match in conditionAli A.F.N22 Oct
  • Re: Percent of match in conditionJohan De Meersman22 Oct
  • Re: Percent of match in conditionMySQL)22 Oct
  • RE: Percent of match in conditionTravis Ard22 Oct