In the last episode (Aug 19), matt ryan said:
> I think oracle parallel query is calling me
> 110,832,565 stat records
> 77,269,086 on weekly update, I get small daily files, but daily sql's
> dont work very well, and miss records, in this case it missed 563 records.
> mysql> update stat_in set ctasc='321ST';
> Query OK, 77269086 rows affected (24 min 17.60 sec)
> Rows matched: 77269086 Changed: 77269086 Warnings: 0
This is very fast (53000 updates per second). If you are truncating
this table after the following insert, you can skip this step
completely by selecting field1,field2,'321ST',field4 (for example) in
your INSERT statement instead of selecting *.
> mysql> insert ignore into 321st_stat select * from stat_in;
> Query OK, 563 rows affected (1 day 28 min 35.95 sec)
> Records: 77269086 Duplicates: 77268523 Warnings: 0
And this is definitely too slow :) You'll probably have to look at the
mysql stats while this query is running to determine exactly what it's
doing, since mysql can't run EXPLAIN on INSERT commands.
One alternative, since you know you don't have many records to insert,
is to pull the IDs of the missing records and insert just those. Do an
outer join on the two tables (joining on the primary key), get a list
of the IDs of records in stat_in but not in 321st_stat, and add a
"WHERE id IN (list,of,ids)" clause to the end of your INSERT ... SELECT
statement. If you're running 4.1, you can use a subquery and embed the
first query directly in the INSERT.