List:MySQL++« Previous MessageNext Message »
From:Warren Young Date:October 14 2008 11:30am
Subject:Re: Using MySQL++ with lots and lots of data
View as plain text  
On Oct 13, 2008, at 3:22 PM, Rick Gutleber wrote:

> Increasing the packet size limit

I don't see how this would help.  TCP's window size will be smaller  
than the MySQL packet size.  Once that's filled, you can keep on  
pushing more data into the stack, and it will queue it for you (or  
block the sender, same thing), but it won't make the data go out any  
faster.

MySQL has a packet size parameter to handle the case where a *single  
record* may be too big for a default packet.  You increase it only to  
fix this problem, not to get better performance.

Have you eliminated the possibility that disk I/O is the bottleneck?   
If you have less than a gigabit per second of data continually going  
between client and server, it could be disk I/O on either end holding  
you up, not packet overhead.

It takes a real server -- as opposed to a desktop PC in a rack case --  
to continually fill a gigabit pipe from disk.  This is doubly the case  
on the DB server, which has more than simple disk I/O overhead.  The  
DB server's disk subsystem has to be fast enough to do all the DB  
stuff on top of sinking a gigabit per second of data.

> Is there any kind of performance gain that can be made by issuing  
> bulk INSERT statements over inserting records individually

Yes, up to the point that you can keep the TCP window full, or at  
least continually "sated".  Beyond that, no.

> can MySQL++ be used to gain the performance advantages of bulk  
> INSERT statements without having to do a whole bunch of special  
> coding?

You can use Query::insert(iter, iter) for this.  You wrap it in a  
loop, passing a subranges of the vector on each iteration, covering  
the whole vector a chunk at a time.

> I would love to contribute to creating a patch for such an  
> improvement if it makes sense.


We accept your offer. :)

A possible design:

Add Query::insert(container, policy), on the pattern of  
Query::storein().  The policy parameter is a pointer to a policy object.

If you pass 0 for the policy, it gives the same effect as  
Query::insert(con.begin(), con.end()): the whole container is expected  
to fit into a single MySQL packet.  Can't make this a default  
parameter, else you create a possibility of ambiguity w.r.t.  
Query::insert(const SSQLS&).

You want the ability to separate policy from function because Query  
can't hard-code details like the server's configured max packet size,  
the min/max/avg object size, etc.

Policy object suggestions:

1. MaxPacketPolicy: create it with a size parameter, and it will pack  
as many records into the query as possible, up to that packet size  
limit.  The difficulty here is that you probably have to build each  
values() fragment separately, then concatenate them, because there's  
no easier way to tell exactly how many bytes a given SSQLS holds in  
SQL terms.  You'd be betting the extra memory overhead pays for the  
savings on disk and network I/O.

2. SizeThresholdPolicy: like above, but it doesn't check the size of  
the records before inserting them.  It just keeps a running count of  
query size, and when it goes over, it sends the query.  This works  
well when you give it a size on the order of the TCP window size,  
which should be much lower than the MySQL packet size.  As long as the  
max record size is smaller than the MySQL max packet less the size  
threshold, you never overflow.  Relative to the max-packet policy, it  
saves the cost of the extra data copy, but it loses the guaranteed  
safety of a hard upper packet size limit.

3. RecordCountPolicy: N records per query, every time.  This just  
takes us further down the safety vs. simplicity axis toward simplicity.

This design lets you try multiple things, benchmarking each, and  
deciding on one based on what you see in your particular situation.
Thread
Using MySQL++ with lots and lots of dataRick Gutleber13 Oct
  • Re: Using MySQL++ with lots and lots of dataAndrew Sayers14 Oct
  • Re: Using MySQL++ with lots and lots of dataWarren Young14 Oct
    • Re: Using MySQL++ with lots and lots of dataRick Gutleber14 Oct
      • Re: Using MySQL++ with lots and lots of dataWarren Young15 Oct
        • Re: Using MySQL++ with lots and lots of dataRick Gutleber15 Oct
          • Re: Using MySQL++ with lots and lots of dataWarren Young15 Oct
            • Re: Using MySQL++ with lots and lots of dataRick Gutleber15 Oct
              • Re: Using MySQL++ with lots and lots of dataWarren Young15 Oct