List:MySQL++« Previous MessageNext Message »
From:Warren Young Date:March 23 2009 9:52am
Subject:Re: binary bulk-insert
View as plain text  
On Mar 22, 2009, at 10:52 PM, Andrej van der Zee wrote:

> "When loading a table from a text file, use LOAD DATA INFILE. This  
> is usually 20 times faster than using INSERT statements."

I wish they'd said whether their test used many single-row INSERT  
statements, or multi-statements.  It could be that the latter  
approaches the speed of LOAD DATA INFILE.

Since we can't trust that report, you should do your own benchmarks.

> I was wondering if there are any plans by MySQL++ to wrap this up  
> into some easy methods.

There's nothing on the Wishlist for this, no.

Since you're going to write the code anyway, consider making it a  
change to Query rather than code specific to your project.  It might  
look like this:

	bool Query::load_file(
		const char* path,
		bool local_path,
		const char* table,
		const char* field_terminator = 0,	// default \t
		const char* field_encloser = 0,		// default none
		const char* field_escape = 0,		// default \
		const char* line_terminator = 0,	// \n on *ix, \r\n on Windows
		const char* line_starter = 0,		// default none
		const char* character_set = 0,		// default UTF-8
		const char* comment_prefix = 0,		// ignore no lines
		bool replace_existing = false,
		bool ignore_duplicates = false,
		bool low_priority = false,
		int skip_lines = 0);

This is a fairly ugly interface, so there could be some common  
wrappers, too.

load_local_tab_file() would be the same as above, but with as many of  
the default parameters above as we can assume for a tab-delimited  
files eliminated.

load_local_csv_file() would be similar, but with the default  
parameters set for traditional CSV files.

load_remote_*() should also exist.

There should also be a similar Query::save_file() set of interfaces.

> I can imagine that by writing an SSQLS collection to a temporary  
> file and using it in "LOAD DATA INFILE" could outperform any  
> (binary) bulk-insert by far!

I think you're ignoring how slow hard disks are.  Unless you have a  
fast RAID (as opposed to slow RAID, which still is pretty common) it's  
going to be at least a few times slower than your network link, so it  
*will* materially affect the upload speed.  It's even worse if the  
MySQL server is on the same machine as the client.

We could maybe hack something up with named pipes to avoid that  
overhead, but I'd want to see benchmark results that prove that multi- 
statement INSERT is indeed many times slower than LOAD DATA INFILE  
before embarking on that sort of hacky effort.  It would require a  
different implementation on POSIX and Windows, for one thing.

If the benchmarks prove multi-statement INSERT to be as fast, or  
nearly as fast, as LOAD DATA INFILE, load_file() and friends are still  
useful for the case where you have (or need) delimited text files of  
data.  The main consequence is that it means there's no need to try to  
create a sort of LOAD DATA FROM SSQLS feature; we already have it.
binary bulk-insertAndrej van der Zee13 Mar
  • Re: binary bulk-insertWarren Young13 Mar
Re: binary bulk-insertAndrej van der Zee13 Mar
  • Re: binary bulk-insertWarren Young13 Mar
Re: binary bulk-insertAndrej van der Zee23 Mar
  • Re: binary bulk-insertWarren Young23 Mar
Re: binary bulk-insertAndrej van der Zee23 Mar