List:General Discussion« Previous MessageNext Message »
From:John Stannard Date:June 6 2005 11:56pm
Subject:JOINs and composite indexes
View as plain text  
Conceptually, what I'd like to do is join two tables on a hostid for a
result like this non-join version where hostid in the 'IN' come from the
other table. 


mysql> explain select avg(load5min) from kstatHostData_20050513 where
hostid IN (250, 400) and localdate > '2005-06-06 13:00:00'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: kstatHostData_20050513
         type: range
possible_keys: hostid_gmtdate_idx,hostid_localdate_idx
          key: hostid_localdate_idx
      key_len: 11
          ref: NULL
         rows: 385
        Extra: Using where
1 row in set (0.01 sec)


This works great.  It correctly picks the hostid_localdate_idx index and
the query is fast.  

If I convert this to JOIN a table which contains exactly the same values
( two rows: 250, 400) as in the IN clause above, mysql picks either key
and only using hostid portion to complete the join, e.g., 

mysql> explain select avg(load5min) from kstatHostData_20050513 ks,
hostinfo hi WHERE ks.hostid = hi.hostid and  localdate > '2005-06-06
13:00:00'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: hi
         type: index
possible_keys: bar_idx
          key: bar_idx
      key_len: 4
          ref: NULL
         rows: 2
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: ks
         type: ref
possible_keys: hostid_gmtdate_idx,hostid_localdate_idx
          key: hostid_localdate_idx
      key_len: 3
          ref: postinistats.hi.hostid
         rows: 32338
        Extra: Using where
2 rows in set (0.00 sec)

mysql>

This query, needless to say, is abysmally slow.  The actual number of
applicable rows needed, as indicated in the first example where the full
index is used, is approximately 400.   It makes sense that either index
will do as they both start with the hostid and you can only use one
index, I'm just not sure how to work around it when the WHERE date
condition signifincantly limits the result. I tried a subquery which
yield results that actually appear worst than the join.  (See below).

Short of populating my IN block in the application with a pre-query, or
re-organizing data into summaries to limit rows, I'm out of ideas on how
to improve this.  Maybe I'm missing some obvious solution.   If anyone
has any thoughts or explanations, I'd really appeciate them.   Mysql is
4.1.12-standard.  

Thanks, 

John  


mysql> explain select avg(load5min) from kstatHostData_20050513 WHERE
hostid IN ( SELECT hostid FROM hostinfo) AND  localdate > '2005-06-06
13:00:00'\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: kstatHostData_20050513
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 26797461
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: hostinfo
         type: index_subquery
possible_keys: bar_idx
          key: bar_idx
      key_len: 4
          ref: func
         rows: 2
        Extra: Using index
2 rows in set (0.00 sec)

mysql>
Thread
JOINs and composite indexesJohn Stannard7 Jun