From: Date: October 11 2005 6:25am Subject: Re: Non-linear degradation in bulk loads? List-Archive: http://lists.mysql.com/mysql/190220 Message-Id: <434B3EBA.1040202@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Jon Frisby wrote: > Everyone, > > We're trying to do some bulk data loads on several different tables (on > several different machines, using several different techniques) and > seeing dramatically worse-than-linear performance. > > We've tried the bulk-INSERT syntax, and the LOAD DATA INFILE syntax. > We've done ALTER TABLE ... DISABLE KEYS, SET FOREIGN_KEY_CHECKS=0 (where > appropriate), and so forth. > > The one that is the most immediate concern is a table of the form: > > CREATE TABLE `test` ( > `email` varchar(255) NOT NULL default '', > `when_happened` datetime NOT NULL default '0000-00-00 00:00:00', > UNIQUE KEY `email` (`email`), > KEY `when_happened` (`when_happened`) > ) TYPE=InnoDB; > > I'm loading data using LOAD DATA INFILE with chunks containing 3.4m rows > each (~135MB files). The first chunk was very quick (about 1.5 > minutes), but the tenth chunk has taken 22.6 hours and is still going. > (It's been getting progessively slower with each chunk...) > > The database is our main sites database but we've dramatically reduced > the load on that machine over the past couple months through careful > optimization of our code. The box is a dual, dual-core Opteron, 8GB of > RAM running a 32-bit Linux 2.4 kernel and MySQL 4.0.20 (32-bit of > course). We have 1GB allocated to the buffer pool, and our usual 1GB * > 3 log files. 8 I/O threads. > > Load on the box sits at around 6-7, with a large (>50%) amount of time > spent in wait state, but actual disk throughput to our software RAID > array (No longer on a SAN...) is quite low -- 6-9k blocks/s out, 1-6k > blocks/s in. > > Something *has* to be wrong here, but we're not sure what we've missed. > We've restored larger data sets from a mysqldump in the past in > dramatically less time on far inferior hardware. (A superset of this > same data to a schema which is also a superset, PLUS a bunch of other > rather large tables -- all in ~8 hours on a 3Ware RAID array on a dual > Xeon w/ 4GB of RAM) > > We're inclined to believe that this is a configuration problem, as > opposed to a driver or hardware problem given the non-linear nature of > the performance degradation. This implies we're doing something truly > stupid with our loads. What could cause this kind of strangeness? > > -JF > Hi Jon, I experienced this same non-linear degradation during large imports, exactly like you are describing, about 18 months ago. I don't remember if I found a specific cause, but I am fairly certain that it was related to a few issues, and that we did resolve it. I have not seen this happen with MyISAM tables, and we were able import our full data by breaking it into chunks and waiting between each chunk, so I believe it to be related to InnoDB's logs in some way. Since you are already importing your data in chunks, try making each chunk a separate transaction, or waiting until disk activity slows to load the next chunk. Also, there have been major improvements to InnoDB in the later 4.1 releases, so if possible, I would suggest upgrading. Hope that helps! Devananda vdv