Hi Mike,
Thanks for your detailed answer. Now, I understand what you mean. And,
yes, I agree with you that keeping all data in one table works better
for a bunch of 1:1 relationship tables. Actually, this is what I was
trying to do with that query.
Since you mention "They all had a 1:1 relationship and occasionally some
of the tables did not have a corresponding row." and "From then on I've
merged all 8 tables into one and if any of the subordinate table data
isn't available for a row, its columns are set to NULL", I do want to
ask you about how you set the columns to NULL for rows in subordinate
table data unavailable because I have similar situation.
If I want to combine two tables into one, I think that a full outer join
can achieve what you did. However, MySQL does not have full outer join.
So, I am using
create table rmpdata1
(select ri.*, mv.MV, coalesce(ri.code,mv.code) as ccode,
coalesce(ri.ndate,mv.ndate) as cndate
from
RItime as ri left outer join MVtime as mv
on
(ri.code=mv.code and ri.ndate=mv.ndate))
union
(select ri.*, mv.MV, coalesce(ri.code,mv.code) as ccode,
coalesce(ri.ndate,mv.ndate) as cndate
from
RItime as ri right outer join MVtime as mv
on
(ri.code=mv.code and ri.ndate=mv.ndate));
This query takes more than twice as much time as the query in my first
e-mail. Do you have a better way? Thanks.
Best,
Jia
mos wrote:
> Jia,
>
>> Yes, it is a 1:1 relationship between table RItime and MVtime.
>> However, I don't get your suggestion, "I'd recommend joining the two
>> tables into 1 table so you don't have to join them in the first
>> place." Could you elaborate that?
>
> Sure but first I have to relate it to my own experience. I had 8
> tables of around 25 million rows each. They all had a 1:1 relationship
> and occasionally some of the tables did not have a corresponding row.
> I felt it was better from a design point of view to have 8 different
> tables and do the joins on the tables that I needed for each of my
> particular queries. I'd be joining anywhere from 2 to 5 or 6 or even
> all 8 tables at a time, using a where clause to select 15k rows at a
> time. This is the way to do it from a normalized point of view. All of
> the information is in its respective table and only assemble the
> tables for each particular query.
>
> Well, this was sloooowww! A heck of a lot of work was done to join the
> tables together on a 2 column key (like yours). I also had to run
> maintenance on the tables to see which tables where corrupted or were
> missing rows that should have been there. The tables also repeated
> columns from the other tables like date and product_id that is used to
> help identify each row. Well to make a long story short, it was far
> too much effort to juggle the relationships between all of these tables.
>
> Then a colleague made the monumental announcement by saying "I've
> never found the need to use more than 1 table when there was a 1:1
> relationship. There is a tremendous speed cost involved in piecing the
> data back together. I put all of the data into 1 table". So the light
> went on for me. From then on I've merged all 8 tables into one and if
> any of the subordinate table data isn't available for a row, its
> columns are set to NULL, which is the values they would have had
> anyway after a left join.
>
> I am perfectly happy know with one wide table with over 100 columns.
> Everything is in its place and maintenance is a dream. Queries are
> also quite fast because all of the information is under one table and
> not 8. I don't have to worry about optimizing the indexes for the
> table joins because there aren't any joins between these tables
> because it is all in 1 row.
>
> So you really have to ask yourself, why spend 10 minutes each time
> your query is run? Instead you eliminate the query altogether by
> keeping the data of the 2 tables into 1 table in the first place.
>
> Mike
>
>
>
> At 09:45 AM 9/6/2009, Jia Chen wrote:
>> Thanks for your reply, Mike.
>>
>> Yes, 13419851 rows were added to rmpdata1. However, 10 minutes seem
>> to be too long. I run the same join by using SQL procedure in a
>> statistical software called SAS on a similar machine. It only takes 1
>> minute and 3 seconds.
>>
>> Yes, it is a 1:1 relationship between table RItime and MVtime.
>> However, I don't get your suggestion, "I'd recommend joining the two
>> tables into 1 table so you don't have to join them in the first
>> place." Could you elaborate that?
>>
>> I was using B-tree index. Switching to HASH does help. Now, it takes
>> 4 min 50.17 sec to run the query.
>> I also turn on profiling by using
>> mysql> set profiling = 1;
>> Query OK, 0 rows affected (0.01 sec)
>>
>> After the query finishes, I get
>> mysql> show profile;
>> +----------------------+------------+
>> | Status | Duration |
>> +----------------------+------------+
>> | starting | 0.000123 |
>> | checking permissions | 0.000010 |
>> | Opening tables | 0.000044 |
>> | System lock | 0.000007 |
>> | Table lock | 0.000011 |
>> | init | 0.000083 |
>> | creating table | 0.003428 |
>> | After create | 0.000124 |
>> | System lock | 0.000004 |
>> | Table lock | 0.000051 |
>> | optimizing | 0.000007 |
>> | statistics | 0.000033 |
>> | preparing | 0.000020 |
>> | executing | 0.000004 |
>> | Sending data | 290.153530 |
>> | end | 0.000008 |
>> | end | 0.000004 |
>> | query end | 0.000003 |
>> | freeing items | 0.000010 |
>> | closing tables | 0.000025 |
>> | logging slow query | 0.000001 |
>> | logging slow query | 0.013429 |
>> | cleaning up | 0.000004 |
>> +----------------------+------------+
>> 23 rows in set (0.02 sec)
>>
>> MySQL spends most of its time sending data. According to
>> http://dev.mysql.com/doc/refman/5.0/en/general-thread-states.html,
>> sending data means that "the thread is processing rows for a |SELECT|
>> <http://dev.mysql.com/doc/refman/5.0/en/select.html> statement and
>> also is sending data to the client." Is there more room to optimize
>> this query? Thanks again.
>>
>> Best,
>> Jia
>>
>> mos wrote:
>>> How many rows were added to rmpdata1 table? If it is 13.4 million
>>> rows then it is going to take several minutes to join this many rows
>>> from the 2 tables.
>>> Is there a 1:1 relationship between the two tables or a 1:Many? If
>>> there is a 1:1 then I'd recommend joining the two tables into 1
>>> table so you don't have to join them in the first place.
>>> The only other thing I can suggest is to change the type of index on
>>> the tables being joined to see if that makes a speed difference. For
>>> example, if you are using BTREE then switch to HASH or vice versa.
>>> See http://dev.mysql.com/doc/refman/5.0/en/create-index.html for
>>> more info.
>>>
>>> Mike
>>>
>>> At 10:05 AM 9/5/2009, Jia Chen wrote:
>>>> Hi there,
>>>>
>>>> One simple query took more than 10 minutes. Here is how relevant
>>>> rows in the slow query log looks like:
>>>>
>>>> # Time: 090905 10:49:57
>>>> # User@Host: root[root] @ localhost []
>>>> # Query_time: 649 Lock_time: 0 Rows_sent: 0 Rows_examined: 26758561
>>>> use world;
>>>> create table rmpdata1 select ri.*,
>>>> mv.MV, coalesce(ri.code,mv.code) as ccode,
>>>> coalesce(ri.ndate,mv.ndate) as cndate
>>>> from RItime as ri left outer join
>>>> MVtime as mv
>>>> on (ri.code=mv.code and
>>>> ri.ndate=mv.ndate);
>>>>
>>>> When I "explain" only the select clause, I get
>>>> ------------+----------+-------+
>>>> | id | select_type | table | type | possible_keys | key |
>>>> key_len | ref | rows | Extra |
>>>>
> +----+-------------+-------+--------+---------------+---------+---------+------------------------------+----------+-------+
>
>>>>
>>>> | 1 | SIMPLE | ri | ALL | NULL | NULL | NULL
>>>> | NULL | 13419851 | |
>>>> | 1 | SIMPLE | mv | eq_ref | PRIMARY | PRIMARY | 11
>>>> | world.ri.code,world.ri.ndate | 1 | |
>>>>
> +----+-------------+-------+--------+---------------+---------+---------+------------------------------+----------+-------+
>
>>>>
>>>> 2 rows in set (0.00 sec)
>>>>
>>>> I use "show table status from world;" to get information about two
>>>> tables, RItime and MVtime, in the join clause:
>>>> Name: RItime
>>>> Engine: MyISAM
>>>> Version: 10
>>>> Row_format: Dynamic
>>>> Rows: 13419851
>>>> Avg_row_length: 31
>>>> Data_length: 427721848
>>>> Max_data_length: 281474976710655
>>>> Index_length: 347497472
>>>> Data_free: 0
>>>> Auto_increment: NULL
>>>> Create_time: 2009-09-03 10:17:57
>>>> Update_time: 2009-09-03 12:04:02
>>>> Check_time: NULL
>>>> Collation: latin1_swedish_ci
>>>> Checksum: NULL
>>>> Create_options:
>>>> Comment:
>>>> *************************** 2. row ***************************
>>>> Name: MVtime
>>>> Engine: MyISAM
>>>> Version: 10
>>>> Row_format: Dynamic
>>>> Rows: 13562373
>>>> Avg_row_length: 31
>>>> Data_length: 430220056
>>>> Max_data_length: 281474976710655
>>>> Index_length: 350996480
>>>> Data_free: 0
>>>> Auto_increment: NULL
>>>> Create_time: 2009-09-03 13:31:33
>>>> Update_time: 2009-09-03 13:43:51
>>>> Check_time: NULL
>>>> Collation: latin1_swedish_ci
>>>> Checksum: NULL
>>>> Create_options:
>>>> Comment:
>>>>
>>>> I also describe these two tables:
>>>> mysql> desc RItime;
>>>> +-------+------------+------+-----+------------+-------+
>>>> | Field | Type | Null | Key | Default | Extra |
>>>> +-------+------------+------+-----+------------+-------+
>>>> | code | varchar(6) | NO | PRI | | |
>>>> | ndate | date | NO | PRI | 0000-00-00 | |
>>>> | ri | double | YES | | NULL | |
>>>> | time | date | YES | | NULL | |
>>>> | bdate | date | YES | | NULL | |
>>>> +-------+------------+------+-----+------------+-------+
>>>> 5 rows in set (0.00 sec)
>>>>
>>>> mysql> desc MVtime;
>>>> +-------+------------+------+-----+------------+-------+
>>>> | Field | Type | Null | Key | Default | Extra |
>>>> +-------+------------+------+-----+------------+-------+
>>>> | code | varchar(6) | NO | PRI | | |
>>>> | ndate | date | NO | PRI | 0000-00-00 | |
>>>> | MV | double | YES | | NULL | |
>>>> | time | date | YES | | NULL | |
>>>> | bdate | date | YES | | NULL | |
>>>> +-------+------------+------+-----+------------+-------+
>>>> 5 rows in set (0.00 sec)
>>>>
>>>> Could you give me some hint on how to improve the speed of this query?
>>>> Thanks.
>>>>
>>>> Best,
>>>> Jia
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> MySQL General Mailing List
>>>> For list archives: http://lists.mysql.com/mysql
>>>> To unsubscribe:
>>>> http://lists.mysql.com/mysql?unsub=1
>>>
>
>