List:General Discussion« Previous MessageNext Message »
From:Michael Widenius Date:June 3 1999 6:54am
Subject:many self-joins in select
View as plain text  
>>>>> "andreas" == andreas pieper <mego@stripped> writes:

andreas> hi,
andreas> i have come across a problem (yes!)

andreas> i have set up an database with about 100.000 small images and about
andreas> 2.000.000 keywords for these images, so on average one image has about 20
andreas> keywords describing the image.

andreas> i want to search the image database for images containing each of the given
andreas> keywords. 

andreas> a typical query (two keywords to search for) looks like this:

andreas> select distinct 
andreas> 		p.id 
andreas> 	from 
andreas> 		product p, 
andreas> 		fotokeywords f0, 
andreas> 		fotokeywords f1  
andreas> 	where 
andreas> 		f0.fotoid=p.id 
andreas> 		and f1.fotoid=p.id 
andreas> 		and f0.keyword like 'office%' 
andreas> 		and f1.keyword like 'chair%' 
andreas> 	limit 201

andreas> the performance i get for these queries varies greatly, if one of the
andreas> keywords is very seldom used, the query is very fast, if only one keyword
andreas> is used, the query is very fast (no surprise), but if more keywords are
andreas> given and each keyword is one of the 'more popular' keywords queries take
andreas> ages to complete.

andreas> For output the above query (using two very common keywords) ends with
andreas> ...
andreas> 63 rows in set (22.72 sec) -- too long, i think, on a pII/233/64MB.


andreas> heres the output of explain for the above query:

andreas> +-------+--------+----------------+---------+---------+-----------+------+-
andreas> ------+
andreas> | table | type   | possible_keys  | key     | key_len | ref       | rows |
andreas> Extra |
andreas> +-------+--------+----------------+---------+---------+-----------+------+-
andreas> ------+
andreas> | f1    | range  | keyword,fotoid | keyword |    NULL | NULL      | 1289 |
andreas> |
andreas> | p     | eq_ref | PRIMARY        | PRIMARY |      80 | f1.fotoid |    1 |
andreas> |
andreas> | f0    | ref    | keyword,fotoid | fotoid  |      80 | p.id      |   22 |
andreas> |
andreas> +-------+--------+----------------+---------+---------+-----------+------+-
andreas> ------+

andreas> as far as I can see, all the joins use keys. I have tried to fiddle with
andreas> the mysql buffer sizes, but things did not improve. i ran isamchk -ar on
andreas> all *ISM files, things did not improve.

andreas> I think I am missing something basic here.

Hi!

You can try by adding an index on (id,keyword) and modify your query
to be:

select small_result straight_join distinct 
		p.id 
	from 
		fotokeywords f0, 
		fotokeywords f1  
		product p, 
	where 
		f0.fotoid=p.id 
		and f1.fotoid=f0.fotoid and
		and f0.keyword like 'office%' 
		and f1.keyword like 'chair%' 
	limit 201

This should make the above query a bit faster!

Regards,
Monty
Thread
many self-joins in selectandreas pieper2 Jun
  • Re: many self-joins in selectSasha Pachev3 Jun
    • Re: many self-joins in selectandreas pieper3 Jun
  • many self-joins in selectMichael Widenius3 Jun
    • Re: many self-joins in selectandreas pieper3 Jun
      • Re: many self-joins in selectMichael Widenius3 Jun
        • Re: many self-joins in selectandreas pieper4 Jun
Re: many self-joins in selectandreas pieper5 Jun