Thanks for your reply. The recommendation of loading into innodb and then altering the
storage engine to ndbcluster sounded promising, but I tested it (with the recommended
ndb_batch_size=8*1024*1024) and it took about the same amount of time as just loading
directly into ndb. The table does have two text columns but I can't get rid of those. It
doesn't have any auto increment columns or indexes.
Splitting using the GNU split utility and then kicking off a bunch of load data commands
in parallel was definitely a lot faster, in fact I found the runtime basically scaled
linearly with the number of files even when I went beyond the number of CPU cores
available on the VMs. I only went up to 7 different files but I'll try even more as well.
I guess this is the way to go if there are no other options that can help speed this up.
From: Johan Andersson [mailto:johan@stripped]
Sent: Wednesday, May 09, 2012 4:01 PM
To: Scott Sandler
Subject: Re: NDBCluster Load Data Infile extremely slow
* Split the load data into several files and load in parallel.
* avoid blob/text if you can
* set ndb_batch_size
* if you have auto_increments, ndb_autoincrement_prefetch_sz makes a big difference
See more here about the above:
On Wed, May 9, 2012 at 9:47 PM, Scott Sandler
I've found that the performance of LOAD DATA INFILE for an NDBCluster is over 130x slower
than innodb. I expect it to be slower since it has to insert into multiple nodes and push
data over the network, but the performance difference I'm seeing is 20,000 rows per second
for innodb vs. 150 rows per second into ndbcluster (with the same schema/hardware/etc.).
This slowness is quite a big road block in actually being able to migrate to MySQL
Here's a pastebin of my config.ini<http://pastebin.com/JhrjdXKH>, and another of my
my.cnf<http://pastebin.com/9y1mY7zm>. Are there any parameters I can change or
anything I can try to speed up the ndb data loading?