List:General Discussion« Previous MessageNext Message »
From:Shawn Green Date:January 5 2010 2:18pm
Subject:Re: Join with OR-condition and Indexes
View as plain text  
Tobias Schultze wrote:
> Hello,
> 
>  
> 
> I'm working on an application for my bachelor thesis.
> 
> I'm having a performance problem with a SQL-Query in MySQL5.
> 
> I hoped someone can easily enlighten me in this issue.
> 
>  
> 
> The schema:
> 
>  
> 
> CREATE TABLE IF NOT EXISTS `athletes` (
> 
>   `id` int(11) NOT NULL AUTO_INCREMENT,
> 
>   `last_name` varchar(20) NOT NULL,
> 
>   `first_name` varchar(20) NOT NULL,
> 
>   `gender` enum('male','female') NOT NULL,
> 
>   `birthday` date NOT NULL,
> 
>   `country` char(2) NOT NULL,
> 
>   `club_id` int(11) NOT NULL,
> 
>   `is_active` tinyint(1) NOT NULL,
> 
>   PRIMARY KEY (`id`),
> 
>   KEY `gender_index_idx` (`gender`),
> 
>   KEY `is_active_index_idx` (`is_active`),
> 
>   KEY `club_id_idx` (`club_id`)
> 
> ) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COMMENT='Sportler bzw Mitglieder des
> Verbandes';
> 
>  
> 
> -- --------------------------------------------------------
> 
>  
> 
> CREATE TABLE IF NOT EXISTS `matches` (
> 
>   `id` int(11) NOT NULL AUTO_INCREMENT,
> 
>   `teammatch_id` int(11) NOT NULL,
> 
>   `match_type` varchar(10) NOT NULL,
> 
>   `team1_player_id` int(11) DEFAULT NULL,
> 
>   `team1_partner_id` int(11) DEFAULT NULL,
> 
>   `team2_player_id` int(11) DEFAULT NULL,
> 
>   `team2_partner_id` int(11) DEFAULT NULL,
> 
>   `team1_score` tinyint(3) unsigned DEFAULT NULL,
> 
>   `team2_score` tinyint(3) unsigned DEFAULT NULL,
> 
>   PRIMARY KEY (`id`),
> 
>   KEY `teammatch_id_idx` (`teammatch_id`),
> 
>   KEY `team1_player_id_idx` (`team1_player_id`),
> 
>   KEY `team1_partner_id_idx` (`team1_partner_id`),
> 
>   KEY `team2_player_id_idx` (`team2_player_id`),
> 
>   KEY `team2_partner_id_idx` (`team2_partner_id`)
> 
> ) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COMMENT='Spiele zwischen zwei oder
> vier Sportlern (Einzel und Doppel)' AUTO_INCREMENT=46665 ;
> 
>  
> 
>  
> 
> I want to get all matches for each athlete and calculate statistics such as
> number of matches etc.
> 
> The basic very simplified query is like
> 
>  
> 
> SELECT a.id, COUNT(*)
> 
> FROM athletes a
> 
> LEFT JOIN matches m ON ( 
> 
>             m.team1_player_id = a.id OR 
> 
>             m.team1_partner_id = a.id OR 
> 
>             m.team2_player_id = a.id OR 
> 
>             m.team2_partner_id = a.id
> 
> )
> 
> WHERE a.gender = 'female'
> 
> GROUP BY a.id
> 
>  
> 
>  
> 
> Now the problem is, that mysql uses a full table scan to retrieve the
> matches for an athlete, so the execution takes many seconds or even worse.
> 
> An athlete can be referenced in any of the m.team1_player_id OR
> m.team1_partner_id OR m.team2_player_id OR m.team2_partner_id. (That allows
> doubles matches.)
> 
> Why is an full table scan necessary although there is an index on each of
> these fields? So an index exists for each OR-part of the join condition...
> 
>  
> 
> Here the execution plan:
> 
>  
> 
> 
> id 
> 
> select_type 
> 
> table 
> 
> type 
> 
> possible_keys 
> 
> key 
> 
> key_len 
> 
> ref 
> 
> rows 
> 
> Extra 
> 
> 
> 1
> 
> SIMPLE
> 
> a
> 
> ref
> 
> gender_index_idx
> 
> gender_index_idx
> 
> 1
> 
> const
> 
> 2193
> 
> Using where; Using temporary; Using filesort
> 
> 
> 1
> 
> SIMPLE
> 
> m
> 
> ALL
> 
> team1_player_id_idx,team1_partner_id_idx,team2_pla...
> 
> NULL
> 
> NULL
> 
> NULL
> 
> 46664
> 
>  
> 
>  
> 
>  
> 
> Joining on each fields like the following is very fast and uses the index
> but of course doesn't give me the expected result.
> 
>  
> 
> FROM athletes a
> 
> LEFT JOIN matches m ON (a.id = m.team1_player_id)
> 
> LEFT JOIN matches m2 ON (a.id = m2.team2_player_id)
> 
>  
> 
>  
> 
> Maybe I need to do a workaround using a UNION?
> 
> But this doesn't help either: (It takes 76 seconds)
> 
>  
> 
> FROM athletes a
> 
> LEFT JOIN (
> 
> (
> 
> SELECT team1_player_id AS player_id, teammatch_id, match_type, team1_score,
> team2_score, team1_points, team2_points, no_fight 
> 
> FROM matches
> 
> ) 
> 
> UNION 
> 
> (
> 
> SELECT team1_partner_id, teammatch_id, match_type, team1_score, team2_score,
> team1_points, team2_points, no_fight 
> 
> FROM matches
> 
> )
> 
> UNION 
> 
> (
> 
> SELECT team2_player_id, teammatch_id, match_type, team1_score, team2_score,
> team1_points, team2_points, no_fight 
> 
> FROM matches
> 
> )
> 
> UNION 
> 
> (
> 
> SELECT team2_partner_id, teammatch_id, match_type, team1_score, team2_score,
> team1_points, team2_points, no_fight 
> 
> FROM matches
> 
> )
> 
> ) m ON (a.id = m.player_id)
> 
>  
> 
>  
> 
>  
> 
> I hope someone can help me with this.
> 
> Thanks in advance.
> 
>  

I think the problem here is that you have an ordered vector table 
(main1, partner1, main2, partner2) which requires you to optionally 
match on 4 separate columns to make your JOIN condition because the 
athlete you want to locate could be in any of those 4 places in the 
match record.

What would be faster is to normalize your match data into two tables. 
The first table would be just match information (id, date, location, 
etc) and another table of match-athlete pairs (match_id, athlete_id)

alternatively you would have to use a query like
(
select...
FROM athletes
INNER JOIN matches
   ON ... matches.member1
) UNION (
select...
FROM athletes
INNER JOIN matches
   ON ... matches.partner1
) UNION (
select...
FROM athletes
INNER JOIN matches
   ON ... matches.member2
) UNION (
select...
FROM athletes
INNER JOIN matches
   ON ... matches.partner2
)

in order to use the indexes for the JOINS.

-- 
Shawn Green, MySQL Senior Support Engineer
Sun Microsystems, Inc.
Office: Blountville, TN


Thread
Join with OR-condition and IndexesTobias Schultze4 Jan
  • Re: Join with OR-condition and IndexesShawn Green5 Jan
    • Re: Join with OR-condition and IndexesMichael Dykman5 Jan
      • Re: Join with OR-condition and IndexesTobias Schultze6 Jan
        • Re: Join with OR-condition and IndexesJoerg Bruehe6 Jan