List:General Discussion« Previous MessageNext Message »
From:Gavin Towey Date:September 2 2009 11:50pm
Subject:RE: Speeding up a pretty simple correlated update query
View as plain text  
Do you know that if you create seq column on the original table as an auto_increment
primary key, it will fill in the numbers automatically?  There's no need to create the
values on another table and update with a join.

Regards,
Gavin Towey

-----Original Message-----
From: Hank [mailto:heskin@stripped]
Sent: Wednesday, September 02, 2009 4:35 PM
To: mysql@stripped
Subject: Speeding up a pretty simple correlated update query

Hello All,
  I have a legacy application which was written using a compound primary key
of an item number (non unique) along with a category ID. The combination of
the item number and category ID make the records unique.

  I am in the process of replacing the compound (VARCHAR) keys with an
unique integer key in these tables.

So I have created an item_seq table and assigned a unique sequence number to
each compound key -- it looks like this (all tables are myisam tables, and
mysql version 5.0)

desc item_seq;
+-----------+------------------+------+-----+---------+----------------+
| Field     | Type             | Null | Key | Default | Extra          |
+-----------+------------------+------+-----+---------+----------------+
| seq       | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| itemid    | char(11)         | NO   | MUL |         |                |
| category  | char(4)          | NO   |     |         |                |
+-----------+------------------+------+-----+---------+----------------+

I also have my main transactional table with about 180,000,000 rows -- it
looks like this:

desc item_trans;
+-------------+-----------------------+------+-----+---------------------+-------+
| Field       | Type                  | Null | Key | Default             |
Extra |
+-------------+-----------------------+------+-----+---------------------+-------+
| seq         | int(10) unsigned      | NO   | MUL |                     |
    |
| itemid      | char(11)              | NO   | PRI |                     |
    |
| category    | char(4)               | NO   | PRI |                     |
    |
| transid     | int(10)               | NO   | PRI |                     |
    |

Currently the "seq" field is null for the entire table.  So of course, I
want to update the main transaction table with the new sequence number.

So I've disabled all the keys on the "item_trans" table -- since I am
updating every row, it wouldn't (shouldn't) be using the index anyway.

Here is my correlated update query:

 update item_trans i, item_seq is
 set i.seq=is.seq
 where is.itemid=i.itemid and is.category=i.category;

  If I run an explain on the select version of the update, this is what I
get:

+----+-------------+----------+--------+---------------+--------+---------+------------------------------------------------+-----------+-------+
| id | select_type | table    | type   | possible_keys | key    | key_len |
ref                                            | rows      | Extra |
+----+-------------+----------+--------+---------------+--------+---------+------------------------------------------------+-----------+-------+
|  1 | SIMPLE      | item_trans    | ALL    | PRIMARY       | NULL   | NULL
   | NULL                                           | 178948797 |       |
|  1 | SIMPLE      | item_seq | eq_ref | itemid        | itemid | 20      |
g.item_trans.itemid,g.item_trans.category                |         1 |
|
+----+-------------+----------+--------+---------------+--------+---------+------------------------------------------------+-----------+-------+

... which is exactly what I would expect it to do.  Update every record of
the item_trans table, and do a full index lookup on the items_seq table.

SO... I've been running this query to update item_trans, and it's been
running for 5 days now.

I've also tried running this with the primary key index on the item_trans
table (but not the seq index), and that ran slower in my initial tests.

Are there any faster ways to update 180 million records with a correlated
update query?  And I'm fairly certain that trying to do this in PHP
one-record at a time would take much longer than a SQL solution.

Thanks,

-Hank

The information contained in this transmission may contain privileged and confidential
information. It is intended only for the use of the person(s) named above. If you are not
the intended recipient, you are hereby notified that any review, dissemination,
distribution or duplication of this communication is strictly prohibited. If you are not
the intended recipient, please contact the sender by reply email and destroy all copies
of the original message.
Thread
Speeding up a pretty simple correlated update queryHank3 Sep
  • RE: Speeding up a pretty simple correlated update queryGavin Towey3 Sep
    • Re: Speeding up a pretty simple correlated update queryHank3 Sep
  • Re: Speeding up a pretty simple correlated update queryHank6 Sep
    • Re: Speeding up a pretty simple correlated update querymos7 Sep
      • Re: Speeding up a pretty simple correlated update queryHank7 Sep