List:General Discussion« Previous MessageNext Message »
From:mos Date:October 11 2005 11:35pm
Subject:Re: Non-linear degradation in bulk loads?
View as plain text  
At 11:41 PM 10/10/2005, you wrote:
>Hi Jon,
>Well, may be the next suggestions might help you.
>Disable Keys does apply to non-unique keys only.
>So I suggest to focus on your unique Email key.
>You could do some tests with:
>a. drop the unique key on Email
>b. load the various bulks
>c. after loading, define Email, eg. as (unique) Primary Key.
>After b. you can do also ALTER TABLE ORDER BY Email,
>to get Email index in pace with the physical order of the data.
>Hope it helps ...
>Best wishes, Cor
>
>
>----- Original Message ----- From: "Jon Frisby" <jfrisby@stripped>
>To: <mysql@stripped>
>Sent: Tuesday, October 11, 2005 4:12 AM
>Subject: Non-linear degradation in bulk loads?
>
>
>Everyone,
>
>We're trying to do some bulk data loads on several different tables (on
>several different machines, using several different techniques) and
>seeing dramatically worse-than-linear performance.
>
>We've tried the bulk-INSERT syntax, and the LOAD DATA INFILE syntax.
>We've done ALTER TABLE ... DISABLE KEYS, SET FOREIGN_KEY_CHECKS=0 (where
>appropriate), and so forth.
>
>The one that is the most immediate concern is a table of the form:
>
>CREATE TABLE `test` (
>  `email` varchar(255) NOT NULL default '',
>  `when_happened` datetime NOT NULL default '0000-00-00 00:00:00',
>  UNIQUE KEY `email` (`email`),
>  KEY `when_happened` (`when_happened`)
>) TYPE=InnoDB;
>
>I'm loading data using LOAD DATA INFILE with chunks containing 3.4m rows
>each (~135MB files).  The first chunk was very quick (about 1.5
>minutes), but the tenth chunk has taken 22.6 hours and is still going.
>(It's been getting progessively slower with each chunk...)
>
>The database is our main sites database but we've dramatically reduced
>the load on that machine over the past couple months through careful
>optimization of our code.  The box is a dual, dual-core Opteron, 8GB of
>RAM running a 32-bit Linux 2.4 kernel and MySQL 4.0.20 (32-bit of
>course).  We have 1GB allocated to the buffer pool, and our usual 1GB *
>3 log files.  8 I/O threads.
>
>Load on the box sits at around 6-7, with a large (>50%) amount of time
>spent in wait state, but actual disk throughput to our software RAID
>array (No longer on a SAN...) is quite low -- 6-9k blocks/s out, 1-6k
>blocks/s in.
>
>Something *has* to be wrong here, but we're not sure what we've missed.
>We've restored larger data sets from a mysqldump in the past in
>dramatically less time on far inferior hardware. (A superset of this
>same data to a schema which is also a superset, PLUS a bunch of other
>rather large tables -- all in ~8 hours on a 3Ware RAID array on a dual
>Xeon w/ 4GB of RAM)
>
>We're inclined to believe that this is a configuration problem, as
>opposed to a driver or hardware problem given the non-linear nature of
>the performance degradation.  This implies we're doing something truly
>stupid with our loads.  What could cause this kind of strangeness?
>
>-JF

JF,
         It's likely the indexes that are causing the slowdown. If you 
remove all indexes from the table definition and start with an empty table, 
you should see a dramatic speed increase. When all the data has been 
loaded, use one Alter Table command to rebuild all the indexes.  You could 
try to optimize the table after each load to see if that speeds things up 
(this will rebalance the index distribution). I was able to load 100 
million rows relatively fast, but failed miserably at 500 million rows 
because of insufficient memory.

Mike

P.S. I suppose you already know when you use Load Data to load data into an 
empty table, it won't update the indexes until the load has completed. This 
is why loading data into an empty table is much faster than loading data 
into a table that has rows in it (even if it is only 1 row). 

Thread
Non-linear degradation in bulk loads?Jon Frisby11 Oct
  • Re: Non-linear degradation in bulk loads?Manoj11 Oct
  • Re: Non-linear degradation in bulk loads?C.R. Vegelin11 Oct
    • Re: Non-linear degradation in bulk loads?mos11 Oct
  • Re: Non-linear degradation in bulk loads?Devananda11 Oct
RE: Non-linear degradation in bulk loads?Jon Frisby11 Oct
Re: Non-linear degradation in bulk loads?Heikki Tuuri11 Oct
RE: Non-linear degradation in bulk loads?Jon Frisby12 Oct
  • Re: Non-linear degradation in bulk loads?Heikki Tuuri12 Oct
RE: Non-linear degradation in bulk loads?Jon Frisby14 Oct
Re: Non-linear degradation in bulk loads?Heikki Tuuri16 Oct
RE: Non-linear degradation in bulk loads?Jon Frisby17 Oct
RE: Non-linear degradation in bulk loads?Jon Frisby17 Oct
  • Re: Non-linear degradation in bulk loads?Heikki Tuuri17 Oct
RE: Non-linear degradation in bulk loads?Jon Frisby17 Oct
  • Re: Non-linear degradation in bulk loads?Heikki Tuuri17 Oct
RE: Non-linear degradation in bulk loads?Jon Frisby17 Oct
RE: Non-linear degradation in bulk loads?Jon Frisby18 Oct
Re: Non-linear degradation in bulk loads?Heikki Tuuri18 Oct