From: Warren Young Date: March 23 2009 9:52am Subject: Re: binary bulk-insert List-Archive: http://lists.mysql.com/plusplus/8500 Message-Id: <4754DE3E-8ECD-468C-BBC1-340450FC99A0@etr-usa.com> MIME-Version: 1.0 (Apple Message framework v930.3) Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit On Mar 22, 2009, at 10:52 PM, Andrej van der Zee wrote: > "When loading a table from a text file, use LOAD DATA INFILE. This > is usually 20 times faster than using INSERT statements." I wish they'd said whether their test used many single-row INSERT statements, or multi-statements. It could be that the latter approaches the speed of LOAD DATA INFILE. Since we can't trust that report, you should do your own benchmarks. > I was wondering if there are any plans by MySQL++ to wrap this up > into some easy methods. There's nothing on the Wishlist for this, no. Since you're going to write the code anyway, consider making it a change to Query rather than code specific to your project. It might look like this: bool Query::load_file( const char* path, bool local_path, const char* table, const char* field_terminator = 0, // default \t const char* field_encloser = 0, // default none const char* field_escape = 0, // default \ const char* line_terminator = 0, // \n on *ix, \r\n on Windows const char* line_starter = 0, // default none const char* character_set = 0, // default UTF-8 const char* comment_prefix = 0, // ignore no lines bool replace_existing = false, bool ignore_duplicates = false, bool low_priority = false, int skip_lines = 0); This is a fairly ugly interface, so there could be some common wrappers, too. load_local_tab_file() would be the same as above, but with as many of the default parameters above as we can assume for a tab-delimited files eliminated. load_local_csv_file() would be similar, but with the default parameters set for traditional CSV files. load_remote_*() should also exist. There should also be a similar Query::save_file() set of interfaces. > I can imagine that by writing an SSQLS collection to a temporary > file and using it in "LOAD DATA INFILE" could outperform any > (binary) bulk-insert by far! I think you're ignoring how slow hard disks are. Unless you have a fast RAID (as opposed to slow RAID, which still is pretty common) it's going to be at least a few times slower than your network link, so it *will* materially affect the upload speed. It's even worse if the MySQL server is on the same machine as the client. We could maybe hack something up with named pipes to avoid that overhead, but I'd want to see benchmark results that prove that multi- statement INSERT is indeed many times slower than LOAD DATA INFILE before embarking on that sort of hacky effort. It would require a different implementation on POSIX and Windows, for one thing. If the benchmarks prove multi-statement INSERT to be as fast, or nearly as fast, as LOAD DATA INFILE, load_file() and friends are still useful for the case where you have (or need) delimited text files of data. The main consequence is that it means there's no need to try to create a sort of LOAD DATA FROM SSQLS feature; we already have it.