List:General Discussion« Previous MessageNext Message »
From:Sasha Pachev Date:June 2 1999 7:56pm
Subject:Re: many self-joins in select
View as plain text  
andreas pieper wrote:
> 
> hi,
> 
> i have come across a problem (yes!)
> 
> i have set up an database with about 100.000 small images and about
> 2.000.000 keywords for these images, so on average one image has about 20
> keywords describing the image.
> 
> i want to search the image database for images containing each of the given
> keywords.
> 
> a typical query (two keywords to search for) looks like this:
> 
> select distinct
>                 p.id
>         from
>                 product p,
>                 fotokeywords f0,
>                 fotokeywords f1
>         where
>                 f0.fotoid=p.id
>                 and f1.fotoid=p.id
>                 and f0.keyword like 'office%'
>                 and f1.keyword like 'chair%'
>         limit 201
> 
> the performance i get for these queries varies greatly, if one of the
> keywords is very seldom used, the query is very fast, if only one keyword
> is used, the query is very fast (no surprise), but if more keywords are
> given and each keyword is one of the 'more popular' keywords queries take
> ages to complete.
> 
> For output the above query (using two very common keywords) ends with
> ...
> 63 rows in set (22.72 sec) -- too long, i think, on a pII/233/64MB.
> 
> heres the output of explain for the above query:
> 
> +-------+--------+----------------+---------+---------+-----------+------+-
> ------+
> | table | type   | possible_keys  | key     | key_len | ref       | rows |
> Extra |
> +-------+--------+----------------+---------+---------+-----------+------+-
> ------+
> | f1    | range  | keyword,fotoid | keyword |    NULL | NULL      | 1289 |
> |
> | p     | eq_ref | PRIMARY        | PRIMARY |      80 | f1.fotoid |    1 |
> |
> | f0    | ref    | keyword,fotoid | fotoid  |      80 | p.id      |   22 |
> |
> +-------+--------+----------------+---------+---------+-----------+------+-
> ------+
> 
> as far as I can see, all the joins use keys. I have tried to fiddle with
> the mysql buffer sizes, but things did not improve. i ran isamchk -ar on
> all *ISM files, things did not improve.
> 
> I think I am missing something basic here.
> 
> best,
> 
> a
> 
> andreas pieper
> mego@stripped

can you use:

select distinct p.id from product as p, fotokeywords as
f 
 where f.id = p.id and (f.keyword = 'office' or
f.keyword = 'chair')

This way we are joining only two tables instead of
three. Notice that I also got rid of LIKE which will
give different results, but better. 
 performance.

Another thing that could make it work faster. Create
another table cal keyword(id int auto_increment primary
key, keyword char(30) not null, unique(keyword)) or
something of that nature. Then change fotokeywords to
store id of a keyword instead of the word itself. Your
query will become:

select distinct p.id from product as p, fotokeywords as
f, keyword as k 
 where f.id = p.id and (k.keyword = 'office' or
k.keyword = 'chair') and 
 k.id = f.keyword

-- 
Sasha Pachev
http://www.sashanet.com/ (home)
http://www.direct1.com/ (work)
Thread
many self-joins in selectandreas pieper2 Jun
  • Re: many self-joins in selectSasha Pachev3 Jun
    • Re: many self-joins in selectandreas pieper3 Jun
  • many self-joins in selectMichael Widenius3 Jun
    • Re: many self-joins in selectandreas pieper3 Jun
      • Re: many self-joins in selectMichael Widenius3 Jun
        • Re: many self-joins in selectandreas pieper4 Jun
Re: many self-joins in selectandreas pieper5 Jun