andreas pieper wrote:
>
> hi,
>
> i have come across a problem (yes!)
>
> i have set up an database with about 100.000 small images and about
> 2.000.000 keywords for these images, so on average one image has about 20
> keywords describing the image.
>
> i want to search the image database for images containing each of the given
> keywords.
>
> a typical query (two keywords to search for) looks like this:
>
> select distinct
> p.id
> from
> product p,
> fotokeywords f0,
> fotokeywords f1
> where
> f0.fotoid=p.id
> and f1.fotoid=p.id
> and f0.keyword like 'office%'
> and f1.keyword like 'chair%'
> limit 201
>
> the performance i get for these queries varies greatly, if one of the
> keywords is very seldom used, the query is very fast, if only one keyword
> is used, the query is very fast (no surprise), but if more keywords are
> given and each keyword is one of the 'more popular' keywords queries take
> ages to complete.
>
> For output the above query (using two very common keywords) ends with
> ...
> 63 rows in set (22.72 sec) -- too long, i think, on a pII/233/64MB.
>
> heres the output of explain for the above query:
>
> +-------+--------+----------------+---------+---------+-----------+------+-
> ------+
> | table | type | possible_keys | key | key_len | ref | rows |
> Extra |
> +-------+--------+----------------+---------+---------+-----------+------+-
> ------+
> | f1 | range | keyword,fotoid | keyword | NULL | NULL | 1289 |
> |
> | p | eq_ref | PRIMARY | PRIMARY | 80 | f1.fotoid | 1 |
> |
> | f0 | ref | keyword,fotoid | fotoid | 80 | p.id | 22 |
> |
> +-------+--------+----------------+---------+---------+-----------+------+-
> ------+
>
> as far as I can see, all the joins use keys. I have tried to fiddle with
> the mysql buffer sizes, but things did not improve. i ran isamchk -ar on
> all *ISM files, things did not improve.
>
> I think I am missing something basic here.
>
> best,
>
> a
>
> andreas pieper
> mego@stripped
can you use:
select distinct p.id from product as p, fotokeywords as
f
where f.id = p.id and (f.keyword = 'office' or
f.keyword = 'chair')
This way we are joining only two tables instead of
three. Notice that I also got rid of LIKE which will
give different results, but better.
performance.
Another thing that could make it work faster. Create
another table cal keyword(id int auto_increment primary
key, keyword char(30) not null, unique(keyword)) or
something of that nature. Then change fotokeywords to
store id of a keyword instead of the word itself. Your
query will become:
select distinct p.id from product as p, fotokeywords as
f, keyword as k
where f.id = p.id and (k.keyword = 'office' or
k.keyword = 'chair') and
k.id = f.keyword
--
Sasha Pachev
http://www.sashanet.com/ (home)
http://www.direct1.com/ (work)