List:MySQL++« Previous MessageNext Message »
From:Rick Gutleber Date:October 13 2008 9:22pm
Subject:Using MySQL++ with lots and lots of data
View as plain text  
I've been meaning to ask this for months:

One of the things I discovered when writing and issuing large INSERT 
statements to MySQL is something called the "packet" size.  This is not 
related to TCP/IP, but rather the amount of data that you can submit 
over a connection in a single query to MySQL.  The default is 1MB.

The project I inherited moves a lot of data around, and originally it 
was coded in a way that only used the most basic of MySQL++ 
functionality, constructing and issuing SQL statements itself in an 
attempt to remain database agnostic.  After struggling with this 
overcomplex edifice for about a year, we decided that it was more 
trouble than it was worth, particularly because it was extremely 
unlikely we would switch from MySQL, and the nature of the abstraction 
meant that a lot of extra work was being done internally that was likely 
slowing things down a bit. 

In the case of bulk inserts, which we use a lot, the program was simply 
constructing a huge INSERT statement along the lines of "INSERT INTO 
table ( column1, column2... columnN) VALUES ( value1a, value 2a... 
valueNa), ( value1b, value2b... valueNb), etc.

Because of this, I have numerous places in my code where, based on the 
average size of my records when expressed as an ASCII string in an 
INSERT statement, I limit the number of records I write at once to a 
certain threshold, with plenty of headroom, so there's no chance I will 
exceed the 1MB limit.  Increasing the packet size limit is obviously a 
possibility, but you still have to check.  We write a _lot_ of data.

Once I made the switch to using MySQL++ more directly, and not building 
and issuing SQL strings everywhere for my interactions with the 
database, I realized that I could throw a whole vector of records at 
MySQL++ and it would write them out for me, no muss, no fuss.  I was 
curious how MySQL++ handled the packet size limitation and dug into the 
code to see what happens underneath.

What I discovered is that MySQL++ bypasses the problem entirely.  
Records are written individually by iterating through the collection and 
inserting each record. 

The question I have for anyone who has experienced the need for better 
performance is this:

Is there any kind of performance gain that can be made by issuing bulk 
INSERT statements over inserting records individually, or is the 
overhead negligible?  Or is there something I'm missing here?

If there is a performance gain, can MySQL++ be used to gain the 
performance advantages of bulk INSERT statements without having to do a 
whole bunch of special coding?

Or alternately, is the possibility of adding this functionality, 
assuming it's useful, feasible?  I would love to contribute to creating 
a patch for such an improvement if it makes sense. 

And finally, performance _is_ an issue for us.  We have made numerous 
improvements through proper indexing, changing to a less normalized, but 
much more optimal, schema for certain tables, and generally trying to 
improve the code.  We've also ordered more RAM for the server, and have 
made a lot of progress.  Originally the app couldn't really keep up with 
what it was trying to do.  Now it sails along smoothly, but demand keeps 
growing and we want to make the best use of our resources as possible, 
like any good project.

Towards that end: 

Are there any tips or tricks or secrets about making the most out of 
using MySQL++ for doing heavy I/O that anyone has come across or figured 
out?

Many thanks,

Rick


Thread
Using MySQL++ with lots and lots of dataRick Gutleber13 Oct
  • Re: Using MySQL++ with lots and lots of dataAndrew Sayers14 Oct
  • Re: Using MySQL++ with lots and lots of dataWarren Young14 Oct
    • Re: Using MySQL++ with lots and lots of dataRick Gutleber14 Oct
      • Re: Using MySQL++ with lots and lots of dataWarren Young15 Oct
        • Re: Using MySQL++ with lots and lots of dataRick Gutleber15 Oct
          • Re: Using MySQL++ with lots and lots of dataWarren Young15 Oct
            • Re: Using MySQL++ with lots and lots of dataRick Gutleber15 Oct
              • Re: Using MySQL++ with lots and lots of dataWarren Young15 Oct