From: Date: October 11 2005 6:04pm Subject: Re: Non-linear degradation in bulk loads? List-Archive: http://lists.mysql.com/mysql/190252 Message-Id: <017401c5ce7d$74f5d930$155110ac@hebis> MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit Jon, my guess is that the inserts to the UNIQUE secondary index cause the workload to be seriously disk-bound. Two solutions: 1) sort the rows to be inserted on the key 'email' before inserting. 2) Or: http://dev.mysql.com/doc/mysql/en/innodb-tuning.html " If you have UNIQUE constraints on secondary keys, starting from MySQL 3.23.52 and 4.0.3, you can speed up table imports by temporarily turning off the uniqueness checks during the import session: SET UNIQUE_CHECKS=0; For big tables, this saves a lot of disk I/O because InnoDB can use its insert buffer to write secondary index records in a batch. " But make sure you do not have any duplicates in the rows! Note that now you can get support on this MySQL mailing list from a Vice President of Oracle. I hope that the level of support improves. Best regards, Heikki Vice President, server technology Oracle/Innobase Oy ................... Everyone, We're trying to do some bulk data loads on several different tables (on several different machines, using several different techniques) and seeing dramatically worse-than-linear performance. We've tried the bulk-INSERT syntax, and the LOAD DATA INFILE syntax. We've done ALTER TABLE ... DISABLE KEYS, SET FOREIGN_KEY_CHECKS=0 (where appropriate), and so forth. The one that is the most immediate concern is a table of the form: CREATE TABLE `test` ( `email` varchar(255) NOT NULL default '', `when_happened` datetime NOT NULL default '0000-00-00 00:00:00', UNIQUE KEY `email` (`email`), KEY `when_happened` (`when_happened`) ) TYPE=InnoDB; I'm loading data using LOAD DATA INFILE with chunks containing 3.4m rows each (~135MB files). The first chunk was very quick (about 1.5 minutes), but the tenth chunk has taken 22.6 hours and is still going. (It's been getting progessively slower with each chunk...) The database is our main sites database but we've dramatically reduced the load on that machine over the past couple months through careful optimization of our code. The box is a dual, dual-core Opteron, 8GB of RAM running a 32-bit Linux 2.4 kernel and MySQL 4.0.20 (32-bit of course). We have 1GB allocated to the buffer pool, and our usual 1GB * 3 log files. 8 I/O threads. Load on the box sits at around 6-7, with a large (>50%) amount of time spent in wait state, but actual disk throughput to our software RAID array (No longer on a SAN...) is quite low -- 6-9k blocks/s out, 1-6k blocks/s in. Something *has* to be wrong here, but we're not sure what we've missed. We've restored larger data sets from a mysqldump in the past in dramatically less time on far inferior hardware. (A superset of this same data to a schema which is also a superset, PLUS a bunch of other rather large tables -- all in ~8 hours on a 3Ware RAID array on a dual Xeon w/ 4GB of RAM) We're inclined to believe that this is a configuration problem, as opposed to a driver or hardware problem given the non-linear nature of the performance degradation. This implies we're doing something truly stupid with our loads. What could cause this kind of strangeness? -JF