On Mar 22, 2009, at 10:52 PM, Andrej van der Zee wrote:
> "When loading a table from a text file, use LOAD DATA INFILE. This
> is usually 20 times faster than using INSERT statements."
I wish they'd said whether their test used many single-row INSERT
statements, or multi-statements. It could be that the latter
approaches the speed of LOAD DATA INFILE.
Since we can't trust that report, you should do your own benchmarks.
> I was wondering if there are any plans by MySQL++ to wrap this up
> into some easy methods.
There's nothing on the Wishlist for this, no.
Since you're going to write the code anyway, consider making it a
change to Query rather than code specific to your project. It might
look like this:
bool Query::load_file(
const char* path,
bool local_path,
const char* table,
const char* field_terminator = 0, // default \t
const char* field_encloser = 0, // default none
const char* field_escape = 0, // default \
const char* line_terminator = 0, // \n on *ix, \r\n on Windows
const char* line_starter = 0, // default none
const char* character_set = 0, // default UTF-8
const char* comment_prefix = 0, // ignore no lines
bool replace_existing = false,
bool ignore_duplicates = false,
bool low_priority = false,
int skip_lines = 0);
This is a fairly ugly interface, so there could be some common
wrappers, too.
load_local_tab_file() would be the same as above, but with as many of
the default parameters above as we can assume for a tab-delimited
files eliminated.
load_local_csv_file() would be similar, but with the default
parameters set for traditional CSV files.
load_remote_*() should also exist.
There should also be a similar Query::save_file() set of interfaces.
> I can imagine that by writing an SSQLS collection to a temporary
> file and using it in "LOAD DATA INFILE" could outperform any
> (binary) bulk-insert by far!
I think you're ignoring how slow hard disks are. Unless you have a
fast RAID (as opposed to slow RAID, which still is pretty common) it's
going to be at least a few times slower than your network link, so it
*will* materially affect the upload speed. It's even worse if the
MySQL server is on the same machine as the client.
We could maybe hack something up with named pipes to avoid that
overhead, but I'd want to see benchmark results that prove that multi-
statement INSERT is indeed many times slower than LOAD DATA INFILE
before embarking on that sort of hacky effort. It would require a
different implementation on POSIX and Windows, for one thing.
If the benchmarks prove multi-statement INSERT to be as fast, or
nearly as fast, as LOAD DATA INFILE, load_file() and friends are still
useful for the case where you have (or need) delimited text files of
data. The main consequence is that it means there's no need to try to
create a sort of LOAD DATA FROM SSQLS feature; we already have it.