List:General Discussion« Previous MessageNext Message »
From:mos Date:September 6 2009 5:12pm
Subject:Re: How to optimize a slow query?
View as plain text  
Jia,

>Yes, it is a 1:1 relationship between table RItime and MVtime. However, I 
>don't get your suggestion, "I'd recommend joining the two tables into 1 
>table so you don't have to join them in the first place."  Could you 
>elaborate that?

Sure but first I have to relate it to my own experience. I had 8 tables of 
around 25 million rows each. They all had a 1:1 relationship and 
occasionally some of the tables did not have a corresponding row. I felt it 
was better from a design point of view to have 8 different tables and do 
the joins on the tables that I needed for each of my particular queries. 
I'd be joining anywhere from 2 to 5 or 6 or even all 8 tables at a time, 
using a where clause to select 15k rows at a time. This is the way to do it 
from a normalized point of view. All of the information is in its 
respective table and only assemble the tables for each particular query.

Well, this was sloooowww! A heck of a lot of work was done to join the 
tables together on a 2 column key (like yours). I also had to run 
maintenance on the tables to see which tables where corrupted or were 
missing rows that should have been there. The tables also repeated columns 
from the other tables like date and product_id that is used to help 
identify each row. Well to make a long story short, it was far too much 
effort to juggle the relationships between all of these tables.

Then a colleague made the monumental announcement by saying "I've never 
found the need to use more than 1 table when there was a 1:1 relationship. 
There is a tremendous speed cost involved in piecing the data back 
together. I put all of the data into 1 table".  So the light went on for 
me. From then on I've merged all 8 tables into one and if any of the 
subordinate table data isn't available for a row, its columns are set to 
NULL, which is the values they would have had anyway after a left join.

I am perfectly happy know with one wide table with over 100 columns. 
Everything is in its place and maintenance is a dream. Queries are also 
quite fast because all of the information is under one table and not 8. I 
don't have to worry about optimizing the indexes for the table joins 
because there aren't any joins between these tables because it is all in 1 row.

So you really have to ask yourself, why spend  10 minutes each time your 
query is run? Instead you eliminate the query altogether by keeping the 
data of the 2 tables into 1 table in the first place.

Mike



At 09:45 AM 9/6/2009, Jia Chen wrote:
>Thanks for your reply, Mike.
>
>Yes, 13419851 rows were added to rmpdata1. However, 10 minutes seem to be 
>too long. I run the same join by using SQL procedure in a statistical 
>software called SAS on a similar machine. It only takes 1 minute and 3 seconds.
>
>Yes, it is a 1:1 relationship between table RItime and MVtime. However, I 
>don't get your suggestion, "I'd recommend joining the two tables into 1 
>table so you don't have to join them in the first place."  Could you 
>elaborate that?
>
>I was using B-tree index. Switching to HASH does help. Now, it takes 4 min 
>50.17 sec to run the query.
>I also turn on profiling by using
>mysql> set profiling = 1;
>Query OK, 0 rows affected (0.01 sec)
>
>After the query finishes, I get
>mysql> show profile;
>+----------------------+------------+
>| Status               | Duration   |
>+----------------------+------------+
>| starting             |   0.000123 |
>| checking permissions |   0.000010 |
>| Opening tables       |   0.000044 |
>| System lock          |   0.000007 |
>| Table lock           |   0.000011 |
>| init                 |   0.000083 |
>| creating table       |   0.003428 |
>| After create         |   0.000124 |
>| System lock          |   0.000004 |
>| Table lock           |   0.000051 |
>| optimizing           |   0.000007 |
>| statistics           |   0.000033 |
>| preparing            |   0.000020 |
>| executing            |   0.000004 |
>| Sending data         | 290.153530 |
>| end                  |   0.000008 |
>| end                  |   0.000004 |
>| query end            |   0.000003 |
>| freeing items        |   0.000010 |
>| closing tables       |   0.000025 |
>| logging slow query   |   0.000001 |
>| logging slow query   |   0.013429 |
>| cleaning up          |   0.000004 |
>+----------------------+------------+
>23 rows in set (0.02 sec)
>
>MySQL spends most of its time sending data. According to 
>http://dev.mysql.com/doc/refman/5.0/en/general-thread-states.html, sending 
>data means that "the thread is processing rows for a |SELECT| 
><http://dev.mysql.com/doc/refman/5.0/en/select.html> statement and also is 
>sending data to the client."  Is there more room to optimize this 
>query?  Thanks again.
>
>Best,
>Jia
>
>mos wrote:
>>How many rows were added to rmpdata1 table? If it is 13.4 million rows 
>>then it is going to take several minutes to join this many rows from the 
>>2 tables.
>>Is there a 1:1 relationship between the two tables or a 1:Many? If there 
>>is a 1:1 then I'd recommend joining the two tables into 1 table so you 
>>don't have to join them in the first place.
>>The only other thing I can suggest is to change the type of index on the 
>>tables being joined to see if that makes a speed difference. For example, 
>>if you are using BTREE then switch to HASH or vice versa. See 
>>http://dev.mysql.com/doc/refman/5.0/en/create-index.html for more info.
>>
>>Mike
>>
>>At 10:05 AM 9/5/2009, Jia Chen wrote:
>>>Hi there,
>>>
>>>One simple query took more than 10 minutes.  Here is how relevant rows 
>>>in the slow query log looks like:
>>>
>>># Time: 090905 10:49:57
>>># User@Host: root[root] @ localhost []
>>># Query_time: 649  Lock_time: 0  Rows_sent: 0  Rows_examined: 26758561
>>>use world;
>>>create table rmpdata1 select ri.*,
>>>mv.MV, coalesce(ri.code,mv.code) as ccode,
>>>coalesce(ri.ndate,mv.ndate) as cndate
>>>from RItime as ri left outer join
>>>MVtime as mv
>>>on (ri.code=mv.code and
>>>ri.ndate=mv.ndate);
>>>
>>>When I "explain" only the select clause, I get
>>>------------+----------+-------+
>>>| id | select_type | table | type   | possible_keys | key     | key_len 
>>>| ref                          | rows     | Extra |
>>>+----+-------------+-------+--------+---------------+---------+---------+------------------------------+----------+-------+
> 
>>>
>>>|  1 | SIMPLE      | ri    | ALL    | NULL          | NULL    | NULL
>>>| NULL                         | 13419851 |       |
>>>|  1 | SIMPLE      | mv    | eq_ref | PRIMARY       | PRIMARY | 11
>>>| world.ri.code,world.ri.ndate |        1 |       |
>>>+----+-------------+-------+--------+---------------+---------+---------+------------------------------+----------+-------+
> 
>>>
>>>2 rows in set (0.00 sec)
>>>
>>>I use "show table status from world;" to get information about two 
>>>tables, RItime and MVtime, in the join clause:
>>>           Name: RItime
>>>         Engine: MyISAM
>>>        Version: 10
>>>     Row_format: Dynamic
>>>           Rows: 13419851
>>>Avg_row_length: 31
>>>    Data_length: 427721848
>>>Max_data_length: 281474976710655
>>>   Index_length: 347497472
>>>      Data_free: 0
>>>Auto_increment: NULL
>>>    Create_time: 2009-09-03 10:17:57
>>>    Update_time: 2009-09-03 12:04:02
>>>     Check_time: NULL
>>>      Collation: latin1_swedish_ci
>>>       Checksum: NULL
>>>Create_options:
>>>        Comment:
>>>*************************** 2. row ***************************
>>>           Name: MVtime
>>>         Engine: MyISAM
>>>        Version: 10
>>>     Row_format: Dynamic
>>>           Rows: 13562373
>>>Avg_row_length: 31
>>>    Data_length: 430220056
>>>Max_data_length: 281474976710655
>>>   Index_length: 350996480
>>>      Data_free: 0
>>>Auto_increment: NULL
>>>    Create_time: 2009-09-03 13:31:33
>>>    Update_time: 2009-09-03 13:43:51
>>>     Check_time: NULL
>>>      Collation: latin1_swedish_ci
>>>       Checksum: NULL
>>>Create_options:
>>>        Comment:
>>>
>>>I also describe these two tables:
>>>mysql> desc RItime;
>>>+-------+------------+------+-----+------------+-------+
>>>| Field | Type       | Null | Key | Default    | Extra |
>>>+-------+------------+------+-----+------------+-------+
>>>| code  | varchar(6) | NO   | PRI |            |       |
>>>| ndate | date       | NO   | PRI | 0000-00-00 |       |
>>>| ri    | double     | YES  |     | NULL       |       |
>>>| time  | date       | YES  |     | NULL       |       |
>>>| bdate | date       | YES  |     | NULL       |       |
>>>+-------+------------+------+-----+------------+-------+
>>>5 rows in set (0.00 sec)
>>>
>>>mysql> desc MVtime;
>>>+-------+------------+------+-----+------------+-------+
>>>| Field | Type       | Null | Key | Default    | Extra |
>>>+-------+------------+------+-----+------------+-------+
>>>| code  | varchar(6) | NO   | PRI |            |       |
>>>| ndate | date       | NO   | PRI | 0000-00-00 |       |
>>>| MV    | double     | YES  |     | NULL       |       |
>>>| time  | date       | YES  |     | NULL       |       |
>>>| bdate | date       | YES  |     | NULL       |       |
>>>+-------+------------+------+-----+------------+-------+
>>>5 rows in set (0.00 sec)
>>>
>>>Could you give me some hint on how to improve the speed of this query?
>>>Thanks.
>>>
>>>Best,
>>>Jia
>>>
>>>
>>>
>>>
>>>--
>>>MySQL General Mailing List
>>>For list archives: http://lists.mysql.com/mysql
>>>To unsubscribe:    http://lists.mysql.com/mysql?unsub=1
>>

Thread
How to optimize a slow query?Jia Chen5 Sep
  • Re: How to optimize a slow query?mos6 Sep
    • Re: How to optimize a slow query?Jia Chen6 Sep
      • Re: How to optimize a slow query?mos6 Sep
        • Re: How to optimize a slow query?Jia Chen6 Sep
          • Re: How to optimize a slow query?mos6 Sep
            • Re: How to optimize a slow query?Jia Chen7 Sep