List:General Discussion« Previous MessageNext Message »
From:mos Date:September 6 2009 10:01pm
Subject:Re: Speeding up a pretty simple correlated update query
View as plain text  
>So I've disabled all the keys on the "item_trans" table -- since I am
>updating every row, it wouldn't (shouldn't) be using the index anyway.

You can't disable unique indexes or primary keys. They are always active. 
You can only deactivate non-unique indexes.

Here are a couple of suggestions.

For now drop the index on item_seq.seq and desc.seq.
Are you sure you have a compound index on item_seq.itemid and 
item_seq.category??

do a "Show create table item_seq" and also "Show create table desc" to see 
what you've got.

Make sure your my.cnf file has
key_buffer_size=500M

equal to about 30% of your available memory. You can always reduce it later.

Of course there is another way of doing it, if you are willing to have the 
tail wag the dog. You may kick yourself for not discovering it yourself. :)

set @num:=0;
set @last:='';
create table new_item_trans select IF(concat(itemid,category),@last, 
@num:=@num+1,@num) Seq, itemid, category, transid, ... 
,@last:=concat(itemid,category) as TMPLast from item_trans order by 
concat(itemid,category);

Now you can use the Alter statement to add your indexes and get rid of the 
TMPLast column.

To build the  item_seq table you would now use:

create table item_seq select seq, itemid, category from new_item_trans 
group by seq, itemid, category;
And of course build your indexes on seq and rename the new_item_trans.

I guarantee you this last solution will not take 9 days to complete! :-)

Mike

At 12:32 PM 9/6/2009, Hank wrote:
>Hello All,
>  I'm reposting this since I didn't get much response the last time, so I'm
>hoping to reach out again.  My correlated update query (see below) was
>running for 9 days before I killed it.   Here is my original question:
>
>   I have a legacy application which was written using a compound primary key
>of an item number (non unique) along with a category ID. The combination of
>the item number and category ID make the records unique.
>
>   I am in the process of replacing the compound (VARCHAR) keys with an
>unique integer key in these tables.
>
>So I have created an item_seq table and assigned a unique sequence number to
>each compound key -- it looks like this (all tables are myisam tables, and
>mysql version 5.0)
>
>desc item_seq;
>+-----------+------------------+------+-----+---------+----------------+
>| Field     | Type             | Null | Key | Default | Extra          |
>+-----------+------------------+------+-----+---------+----------------+
>| seq       | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
>| itemid    | char(11)         | NO   | MUL |         |                |
>| category  | char(4)          | NO   |     |         |                |
>+-----------+------------------+------+-----+---------+----------------+
>
>I also have my main transactional table with about 180,000,000 rows -- it
>looks like this:
>
>desc item_trans;
>+-------------+-----------------------+------+-----+---------------------+-------+
>| Field       | Type                  | Null | Key | Default             |
>Extra |
>+-------------+-----------------------+------+-----+---------------------+-------+
>| seq         | int(10) unsigned      | NO   | MUL |                     |
>     |
>| itemid      | char(11)              | NO   | PRI |                     |
>     |
>| category    | char(4)               | NO   | PRI |                     |
>     |
>| transid     | int(10)               | NO   | PRI |                     |
>     |
>
>Currently the "seq" field is null for the entire table.  So of course, I
>want to update the main transaction table with the new sequence number.
>
>So I've disabled all the keys on the "item_trans" table -- since I am
>updating every row, it wouldn't (shouldn't) be using the index anyway.
>
>Here is my correlated update query:
>
>  update item_trans i, item_seq is
>  set i.seq=is.seq
>  where is.itemid=i.itemid and is.category=i.category;
>
>   If I run an explain on the select version of the update, this is what I
>get:
>
>+----+-------------+----------+--------+---------------+--------+---------+------------------------------------------------+-----------+-------+
>| id | select_type | table    | type   | possible_keys | key    | key_len |
>ref                                            | rows      | Extra |
>+----+-------------+----------+--------+---------------+--------+---------+------------------------------------------------+-----------+-------+
>|  1 | SIMPLE      | item_trans    | ALL    | PRIMARY       | NULL   | NULL
>    | NULL                                           | 178948797 |       |
>|  1 | SIMPLE      | item_seq | eq_ref | itemid        | itemid | 20      |
>g.item_trans.itemid,g.item_trans.category                |         1 |
>|
>+----+-------------+----------+--------+---------------+--------+---------+------------------------------------------------+-----------+-------+
>
>... which is exactly what I would expect it to do.  Update every record of
>the item_trans table, and do a full index lookup on the items_seq table.
>
>SO... I've been running this query to update item_trans, and it's been
>running for 5 days now.
>
>I've also tried running this with the primary key index on the item_trans
>table (but not the seq index), and that ran slower in my initial tests.
>
>Are there any faster ways to update 180 million records with a correlated
>update query?  And I'm fairly certain that trying to do this in PHP
>one-record at a time would take much longer than a SQL solution.
>
>Thanks,
>
>-Hank

Thread
Speeding up a pretty simple correlated update queryHank3 Sep
  • RE: Speeding up a pretty simple correlated update queryGavin Towey3 Sep
    • Re: Speeding up a pretty simple correlated update queryHank3 Sep
  • Re: Speeding up a pretty simple correlated update queryHank6 Sep
    • Re: Speeding up a pretty simple correlated update querymos7 Sep
      • Re: Speeding up a pretty simple correlated update queryHank7 Sep