Jon Frisby wrote:
> Everyone,
>
> We're trying to do some bulk data loads on several different tables (on
> several different machines, using several different techniques) and
> seeing dramatically worse-than-linear performance.
>
> We've tried the bulk-INSERT syntax, and the LOAD DATA INFILE syntax.
> We've done ALTER TABLE ... DISABLE KEYS, SET FOREIGN_KEY_CHECKS=0 (where
> appropriate), and so forth.
>
> The one that is the most immediate concern is a table of the form:
>
> CREATE TABLE `test` (
> `email` varchar(255) NOT NULL default '',
> `when_happened` datetime NOT NULL default '0000-00-00 00:00:00',
> UNIQUE KEY `email` (`email`),
> KEY `when_happened` (`when_happened`)
> ) TYPE=InnoDB;
>
> I'm loading data using LOAD DATA INFILE with chunks containing 3.4m rows
> each (~135MB files). The first chunk was very quick (about 1.5
> minutes), but the tenth chunk has taken 22.6 hours and is still going.
> (It's been getting progessively slower with each chunk...)
>
> The database is our main sites database but we've dramatically reduced
> the load on that machine over the past couple months through careful
> optimization of our code. The box is a dual, dual-core Opteron, 8GB of
> RAM running a 32-bit Linux 2.4 kernel and MySQL 4.0.20 (32-bit of
> course). We have 1GB allocated to the buffer pool, and our usual 1GB *
> 3 log files. 8 I/O threads.
>
> Load on the box sits at around 6-7, with a large (>50%) amount of time
> spent in wait state, but actual disk throughput to our software RAID
> array (No longer on a SAN...) is quite low -- 6-9k blocks/s out, 1-6k
> blocks/s in.
>
> Something *has* to be wrong here, but we're not sure what we've missed.
> We've restored larger data sets from a mysqldump in the past in
> dramatically less time on far inferior hardware. (A superset of this
> same data to a schema which is also a superset, PLUS a bunch of other
> rather large tables -- all in ~8 hours on a 3Ware RAID array on a dual
> Xeon w/ 4GB of RAM)
>
> We're inclined to believe that this is a configuration problem, as
> opposed to a driver or hardware problem given the non-linear nature of
> the performance degradation. This implies we're doing something truly
> stupid with our loads. What could cause this kind of strangeness?
>
> -JF
>
Hi Jon,
I experienced this same non-linear degradation during large imports,
exactly like you are describing, about 18 months ago. I don't remember
if I found a specific cause, but I am fairly certain that it was related
to a few issues, and that we did resolve it. I have not seen this happen
with MyISAM tables, and we were able import our full data by breaking it
into chunks and waiting between each chunk, so I believe it to be
related to InnoDB's logs in some way. Since you are already importing
your data in chunks, try making each chunk a separate transaction, or
waiting until disk activity slows to load the next chunk. Also, there
have been major improvements to InnoDB in the later 4.1 releases, so if
possible, I would suggest upgrading.
Hope that helps!
Devananda vdv