List:General Discussion« Previous MessageNext Message »
From:Jia Chen Date:September 6 2009 6:51pm
Subject:Re: How to optimize a slow query?
View as plain text  
Hi Mike,

Thanks for your detailed answer. Now, I understand what you mean. And, 
yes, I agree with you that keeping all data in one table works better 
for a bunch of 1:1 relationship tables.  Actually, this is what I was 
trying to do with that query.

Since you mention "They all had a 1:1 relationship and occasionally some 
of the tables did not have a corresponding row." and "From then on I've 
merged all 8 tables into one and if any of the subordinate table data 
isn't available for a row, its columns are set to NULL", I do want to 
ask you about how you set the columns to NULL for rows in subordinate 
table data unavailable because I have similar situation.

If I want to combine two tables into one, I think that a full outer join 
can achieve what you did. However, MySQL does not have full outer join. 
So, I am using

create table rmpdata1
(select ri.*, mv.MV, coalesce(ri.code,mv.code) as ccode,
           coalesce(ri.ndate,mv.ndate) as cndate
from
        RItime as ri left outer join MVtime as mv
on
        (ri.code=mv.code and ri.ndate=mv.ndate))
union
(select ri.*, mv.MV, coalesce(ri.code,mv.code) as ccode,
           coalesce(ri.ndate,mv.ndate) as cndate
from
        RItime as ri right outer join MVtime as mv
on
        (ri.code=mv.code and ri.ndate=mv.ndate));

This query takes more than twice as much time as the query in my first 
e-mail. Do you have a better way? Thanks.

Best,
Jia

mos wrote:
> Jia,
>
>> Yes, it is a 1:1 relationship between table RItime and MVtime. 
>> However, I don't get your suggestion, "I'd recommend joining the two 
>> tables into 1 table so you don't have to join them in the first 
>> place."  Could you elaborate that?
>
> Sure but first I have to relate it to my own experience. I had 8 
> tables of around 25 million rows each. They all had a 1:1 relationship 
> and occasionally some of the tables did not have a corresponding row. 
> I felt it was better from a design point of view to have 8 different 
> tables and do the joins on the tables that I needed for each of my 
> particular queries. I'd be joining anywhere from 2 to 5 or 6 or even 
> all 8 tables at a time, using a where clause to select 15k rows at a 
> time. This is the way to do it from a normalized point of view. All of 
> the information is in its respective table and only assemble the 
> tables for each particular query.
>
> Well, this was sloooowww! A heck of a lot of work was done to join the 
> tables together on a 2 column key (like yours). I also had to run 
> maintenance on the tables to see which tables where corrupted or were 
> missing rows that should have been there. The tables also repeated 
> columns from the other tables like date and product_id that is used to 
> help identify each row. Well to make a long story short, it was far 
> too much effort to juggle the relationships between all of these tables.
>
> Then a colleague made the monumental announcement by saying "I've 
> never found the need to use more than 1 table when there was a 1:1 
> relationship. There is a tremendous speed cost involved in piecing the 
> data back together. I put all of the data into 1 table".  So the light 
> went on for me. From then on I've merged all 8 tables into one and if 
> any of the subordinate table data isn't available for a row, its 
> columns are set to NULL, which is the values they would have had 
> anyway after a left join.
>
> I am perfectly happy know with one wide table with over 100 columns. 
> Everything is in its place and maintenance is a dream. Queries are 
> also quite fast because all of the information is under one table and 
> not 8. I don't have to worry about optimizing the indexes for the 
> table joins because there aren't any joins between these tables 
> because it is all in 1 row.
>
> So you really have to ask yourself, why spend  10 minutes each time 
> your query is run? Instead you eliminate the query altogether by 
> keeping the data of the 2 tables into 1 table in the first place.
>
> Mike
>
>
>
> At 09:45 AM 9/6/2009, Jia Chen wrote:
>> Thanks for your reply, Mike.
>>
>> Yes, 13419851 rows were added to rmpdata1. However, 10 minutes seem 
>> to be too long. I run the same join by using SQL procedure in a 
>> statistical software called SAS on a similar machine. It only takes 1 
>> minute and 3 seconds.
>>
>> Yes, it is a 1:1 relationship between table RItime and MVtime. 
>> However, I don't get your suggestion, "I'd recommend joining the two 
>> tables into 1 table so you don't have to join them in the first 
>> place."  Could you elaborate that?
>>
>> I was using B-tree index. Switching to HASH does help. Now, it takes 
>> 4 min 50.17 sec to run the query.
>> I also turn on profiling by using
>> mysql> set profiling = 1;
>> Query OK, 0 rows affected (0.01 sec)
>>
>> After the query finishes, I get
>> mysql> show profile;
>> +----------------------+------------+
>> | Status               | Duration   |
>> +----------------------+------------+
>> | starting             |   0.000123 |
>> | checking permissions |   0.000010 |
>> | Opening tables       |   0.000044 |
>> | System lock          |   0.000007 |
>> | Table lock           |   0.000011 |
>> | init                 |   0.000083 |
>> | creating table       |   0.003428 |
>> | After create         |   0.000124 |
>> | System lock          |   0.000004 |
>> | Table lock           |   0.000051 |
>> | optimizing           |   0.000007 |
>> | statistics           |   0.000033 |
>> | preparing            |   0.000020 |
>> | executing            |   0.000004 |
>> | Sending data         | 290.153530 |
>> | end                  |   0.000008 |
>> | end                  |   0.000004 |
>> | query end            |   0.000003 |
>> | freeing items        |   0.000010 |
>> | closing tables       |   0.000025 |
>> | logging slow query   |   0.000001 |
>> | logging slow query   |   0.013429 |
>> | cleaning up          |   0.000004 |
>> +----------------------+------------+
>> 23 rows in set (0.02 sec)
>>
>> MySQL spends most of its time sending data. According to 
>> http://dev.mysql.com/doc/refman/5.0/en/general-thread-states.html, 
>> sending data means that "the thread is processing rows for a |SELECT| 
>> <http://dev.mysql.com/doc/refman/5.0/en/select.html> statement and 
>> also is sending data to the client."  Is there more room to optimize 
>> this query?  Thanks again.
>>
>> Best,
>> Jia
>>
>> mos wrote:
>>> How many rows were added to rmpdata1 table? If it is 13.4 million 
>>> rows then it is going to take several minutes to join this many rows 
>>> from the 2 tables.
>>> Is there a 1:1 relationship between the two tables or a 1:Many? If 
>>> there is a 1:1 then I'd recommend joining the two tables into 1 
>>> table so you don't have to join them in the first place.
>>> The only other thing I can suggest is to change the type of index on 
>>> the tables being joined to see if that makes a speed difference. For 
>>> example, if you are using BTREE then switch to HASH or vice versa. 
>>> See http://dev.mysql.com/doc/refman/5.0/en/create-index.html for 
>>> more info.
>>>
>>> Mike
>>>
>>> At 10:05 AM 9/5/2009, Jia Chen wrote:
>>>> Hi there,
>>>>
>>>> One simple query took more than 10 minutes.  Here is how relevant 
>>>> rows in the slow query log looks like:
>>>>
>>>> # Time: 090905 10:49:57
>>>> # User@Host: root[root] @ localhost []
>>>> # Query_time: 649  Lock_time: 0  Rows_sent: 0  Rows_examined: 26758561
>>>> use world;
>>>> create table rmpdata1 select ri.*,
>>>> mv.MV, coalesce(ri.code,mv.code) as ccode,
>>>> coalesce(ri.ndate,mv.ndate) as cndate
>>>> from RItime as ri left outer join
>>>> MVtime as mv
>>>> on (ri.code=mv.code and
>>>> ri.ndate=mv.ndate);
>>>>
>>>> When I "explain" only the select clause, I get
>>>> ------------+----------+-------+
>>>> | id | select_type | table | type   | possible_keys | key     | 
>>>> key_len | ref                          | rows     | Extra |
>>>>
> +----+-------------+-------+--------+---------------+---------+---------+------------------------------+----------+-------+
> 
>>>>
>>>> |  1 | SIMPLE      | ri    | ALL    | NULL          | NULL    | NULL
>>>> | NULL                         | 13419851 |       |
>>>> |  1 | SIMPLE      | mv    | eq_ref | PRIMARY       | PRIMARY | 11
>>>> | world.ri.code,world.ri.ndate |        1 |       |
>>>>
> +----+-------------+-------+--------+---------------+---------+---------+------------------------------+----------+-------+
> 
>>>>
>>>> 2 rows in set (0.00 sec)
>>>>
>>>> I use "show table status from world;" to get information about two 
>>>> tables, RItime and MVtime, in the join clause:
>>>>           Name: RItime
>>>>         Engine: MyISAM
>>>>        Version: 10
>>>>     Row_format: Dynamic
>>>>           Rows: 13419851
>>>> Avg_row_length: 31
>>>>    Data_length: 427721848
>>>> Max_data_length: 281474976710655
>>>>   Index_length: 347497472
>>>>      Data_free: 0
>>>> Auto_increment: NULL
>>>>    Create_time: 2009-09-03 10:17:57
>>>>    Update_time: 2009-09-03 12:04:02
>>>>     Check_time: NULL
>>>>      Collation: latin1_swedish_ci
>>>>       Checksum: NULL
>>>> Create_options:
>>>>        Comment:
>>>> *************************** 2. row ***************************
>>>>           Name: MVtime
>>>>         Engine: MyISAM
>>>>        Version: 10
>>>>     Row_format: Dynamic
>>>>           Rows: 13562373
>>>> Avg_row_length: 31
>>>>    Data_length: 430220056
>>>> Max_data_length: 281474976710655
>>>>   Index_length: 350996480
>>>>      Data_free: 0
>>>> Auto_increment: NULL
>>>>    Create_time: 2009-09-03 13:31:33
>>>>    Update_time: 2009-09-03 13:43:51
>>>>     Check_time: NULL
>>>>      Collation: latin1_swedish_ci
>>>>       Checksum: NULL
>>>> Create_options:
>>>>        Comment:
>>>>
>>>> I also describe these two tables:
>>>> mysql> desc RItime;
>>>> +-------+------------+------+-----+------------+-------+
>>>> | Field | Type       | Null | Key | Default    | Extra |
>>>> +-------+------------+------+-----+------------+-------+
>>>> | code  | varchar(6) | NO   | PRI |            |       |
>>>> | ndate | date       | NO   | PRI | 0000-00-00 |       |
>>>> | ri    | double     | YES  |     | NULL       |       |
>>>> | time  | date       | YES  |     | NULL       |       |
>>>> | bdate | date       | YES  |     | NULL       |       |
>>>> +-------+------------+------+-----+------------+-------+
>>>> 5 rows in set (0.00 sec)
>>>>
>>>> mysql> desc MVtime;
>>>> +-------+------------+------+-----+------------+-------+
>>>> | Field | Type       | Null | Key | Default    | Extra |
>>>> +-------+------------+------+-----+------------+-------+
>>>> | code  | varchar(6) | NO   | PRI |            |       |
>>>> | ndate | date       | NO   | PRI | 0000-00-00 |       |
>>>> | MV    | double     | YES  |     | NULL       |       |
>>>> | time  | date       | YES  |     | NULL       |       |
>>>> | bdate | date       | YES  |     | NULL       |       |
>>>> +-------+------------+------+-----+------------+-------+
>>>> 5 rows in set (0.00 sec)
>>>>
>>>> Could you give me some hint on how to improve the speed of this query?
>>>> Thanks.
>>>>
>>>> Best,
>>>> Jia
>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> MySQL General Mailing List
>>>> For list archives: http://lists.mysql.com/mysql
>>>> To unsubscribe:    
>>>> http://lists.mysql.com/mysql?unsub=1
>>>
>
>

Thread
How to optimize a slow query?Jia Chen5 Sep
  • Re: How to optimize a slow query?mos6 Sep
    • Re: How to optimize a slow query?Jia Chen6 Sep
      • Re: How to optimize a slow query?mos6 Sep
        • Re: How to optimize a slow query?Jia Chen6 Sep
          • Re: How to optimize a slow query?mos6 Sep
            • Re: How to optimize a slow query?Jia Chen7 Sep