I've been meaning to ask this for months:
One of the things I discovered when writing and issuing large INSERT
statements to MySQL is something called the "packet" size. This is not
related to TCP/IP, but rather the amount of data that you can submit
over a connection in a single query to MySQL. The default is 1MB.
The project I inherited moves a lot of data around, and originally it
was coded in a way that only used the most basic of MySQL++
functionality, constructing and issuing SQL statements itself in an
attempt to remain database agnostic. After struggling with this
overcomplex edifice for about a year, we decided that it was more
trouble than it was worth, particularly because it was extremely
unlikely we would switch from MySQL, and the nature of the abstraction
meant that a lot of extra work was being done internally that was likely
slowing things down a bit.
In the case of bulk inserts, which we use a lot, the program was simply
constructing a huge INSERT statement along the lines of "INSERT INTO
table ( column1, column2... columnN) VALUES ( value1a, value 2a...
valueNa), ( value1b, value2b... valueNb), etc.
Because of this, I have numerous places in my code where, based on the
average size of my records when expressed as an ASCII string in an
INSERT statement, I limit the number of records I write at once to a
certain threshold, with plenty of headroom, so there's no chance I will
exceed the 1MB limit. Increasing the packet size limit is obviously a
possibility, but you still have to check. We write a _lot_ of data.
Once I made the switch to using MySQL++ more directly, and not building
and issuing SQL strings everywhere for my interactions with the
database, I realized that I could throw a whole vector of records at
MySQL++ and it would write them out for me, no muss, no fuss. I was
curious how MySQL++ handled the packet size limitation and dug into the
code to see what happens underneath.
What I discovered is that MySQL++ bypasses the problem entirely.
Records are written individually by iterating through the collection and
inserting each record.
The question I have for anyone who has experienced the need for better
performance is this:
Is there any kind of performance gain that can be made by issuing bulk
INSERT statements over inserting records individually, or is the
overhead negligible? Or is there something I'm missing here?
If there is a performance gain, can MySQL++ be used to gain the
performance advantages of bulk INSERT statements without having to do a
whole bunch of special coding?
Or alternately, is the possibility of adding this functionality,
assuming it's useful, feasible? I would love to contribute to creating
a patch for such an improvement if it makes sense.
And finally, performance _is_ an issue for us. We have made numerous
improvements through proper indexing, changing to a less normalized, but
much more optimal, schema for certain tables, and generally trying to
improve the code. We've also ordered more RAM for the server, and have
made a lot of progress. Originally the app couldn't really keep up with
what it was trying to do. Now it sails along smoothly, but demand keeps
growing and we want to make the best use of our resources as possible,
like any good project.
Towards that end:
Are there any tips or tricks or secrets about making the most out of
using MySQL++ for doing heavy I/O that anyone has come across or figured
out?
Many thanks,
Rick