>>>>> "andreas" == andreas pieper <mego@stripped> writes:
andreas> hi,
andreas> i have come across a problem (yes!)
andreas> i have set up an database with about 100.000 small images and about
andreas> 2.000.000 keywords for these images, so on average one image has about 20
andreas> keywords describing the image.
andreas> i want to search the image database for images containing each of the given
andreas> keywords.
andreas> a typical query (two keywords to search for) looks like this:
andreas> select distinct
andreas> p.id
andreas> from
andreas> product p,
andreas> fotokeywords f0,
andreas> fotokeywords f1
andreas> where
andreas> f0.fotoid=p.id
andreas> and f1.fotoid=p.id
andreas> and f0.keyword like 'office%'
andreas> and f1.keyword like 'chair%'
andreas> limit 201
andreas> the performance i get for these queries varies greatly, if one of the
andreas> keywords is very seldom used, the query is very fast, if only one keyword
andreas> is used, the query is very fast (no surprise), but if more keywords are
andreas> given and each keyword is one of the 'more popular' keywords queries take
andreas> ages to complete.
andreas> For output the above query (using two very common keywords) ends with
andreas> ...
andreas> 63 rows in set (22.72 sec) -- too long, i think, on a pII/233/64MB.
andreas> heres the output of explain for the above query:
andreas> +-------+--------+----------------+---------+---------+-----------+------+-
andreas> ------+
andreas> | table | type | possible_keys | key | key_len | ref | rows |
andreas> Extra |
andreas> +-------+--------+----------------+---------+---------+-----------+------+-
andreas> ------+
andreas> | f1 | range | keyword,fotoid | keyword | NULL | NULL | 1289 |
andreas> |
andreas> | p | eq_ref | PRIMARY | PRIMARY | 80 | f1.fotoid | 1 |
andreas> |
andreas> | f0 | ref | keyword,fotoid | fotoid | 80 | p.id | 22 |
andreas> |
andreas> +-------+--------+----------------+---------+---------+-----------+------+-
andreas> ------+
andreas> as far as I can see, all the joins use keys. I have tried to fiddle with
andreas> the mysql buffer sizes, but things did not improve. i ran isamchk -ar on
andreas> all *ISM files, things did not improve.
andreas> I think I am missing something basic here.
Hi!
You can try by adding an index on (id,keyword) and modify your query
to be:
select small_result straight_join distinct
p.id
from
fotokeywords f0,
fotokeywords f1
product p,
where
f0.fotoid=p.id
and f1.fotoid=f0.fotoid and
and f0.keyword like 'office%'
and f1.keyword like 'chair%'
limit 201
This should make the above query a bit faster!
Regards,
Monty