List:MySQL++« Previous MessageNext Message »
From:Rick Gutleber Date:October 14 2008 9:20pm
Subject:Re: Using MySQL++ with lots and lots of data
View as plain text  
Warren, I think your suggestions are very reasonable.  I have a number 
of initial questions relating to coding style.  I'm pretty OCD about my 
own coding style, so if I'm contributing to someone else's code I want 
to adhere to the style used as much as possible.

1.  Do you envision the policy as an actual object (i.e., class)?  I 
could easily see the idea of a policy being applicable to other methods 
and classes, so perhaps it's best to think in those terms.

2.  If you're thinking object, would you want a whole hierarchy or just 
some separate classes?  I would naturally think of a class hierarchy 
where everything is derived from a vanilla Policy class, but honestly I 
don't see what purpose that would serve.

3.  I would give the InsertPolicy two members, an enum to represent the 
policy type (one of the 4 types you suggested), and an integer size 
value.  Is the right track?

4.  The code seems to be organized with a single class in each file, 
which is the way I work, too.  Would you want Policy class(es) to go in 
its (their) own file(s)?

5.  Or maybe the policy should be a private struct defined only in the 
Query class?  If you believe the policy concept is likely to be limited 
to this one implementation or a very small number, then perhaps we 
should define it as part of the Query class and not pollute the 
namespace with a bunch of stuff only used in one place?

The idea here is not "What do I do?" because that's straightforward, but 
"How should I get started to fit it in with your vision of MySQL++ and 
coding style?"

Or, would you prefer me to use my own judgement and show you when I have 
something?  ;-)


p.s.  Now I finally understand why you never created the obvious and 
intuitive insert( container ) template method... it would conflict with 
the insert( SSQLS ) method.

Warren Young wrote:
> On Oct 13, 2008, at 3:22 PM, Rick Gutleber wrote:
>> Increasing the packet size limit
> I don't see how this would help.  TCP's window size will be smaller 
> than the MySQL packet size.  Once that's filled, you can keep on 
> pushing more data into the stack, and it will queue it for you (or 
> block the sender, same thing), but it won't make the data go out any 
> faster.
> MySQL has a packet size parameter to handle the case where a *single 
> record* may be too big for a default packet.  You increase it only to 
> fix this problem, not to get better performance.
> Have you eliminated the possibility that disk I/O is the bottleneck?  
> If you have less than a gigabit per second of data continually going 
> between client and server, it could be disk I/O on either end holding 
> you up, not packet overhead.
> It takes a real server -- as opposed to a desktop PC in a rack case -- 
> to continually fill a gigabit pipe from disk.  This is doubly the case 
> on the DB server, which has more than simple disk I/O overhead.  The 
> DB server's disk subsystem has to be fast enough to do all the DB 
> stuff on top of sinking a gigabit per second of data.
>> Is there any kind of performance gain that can be made by issuing 
>> bulk INSERT statements over inserting records individually
> Yes, up to the point that you can keep the TCP window full, or at 
> least continually "sated".  Beyond that, no.
>> can MySQL++ be used to gain the performance advantages of bulk INSERT 
>> statements without having to do a whole bunch of special coding?
> You can use Query::insert(iter, iter) for this.  You wrap it in a 
> loop, passing a subranges of the vector on each iteration, covering 
> the whole vector a chunk at a time.
>> I would love to contribute to creating a patch for such an 
>> improvement if it makes sense.
> We accept your offer. :)
> A possible design:
> Add Query::insert(container, policy), on the pattern of 
> Query::storein().  The policy parameter is a pointer to a policy object.
> If you pass 0 for the policy, it gives the same effect as 
> Query::insert(con.begin(), con.end()): the whole container is expected 
> to fit into a single MySQL packet.  Can't make this a default 
> parameter, else you create a possibility of ambiguity w.r.t. 
> Query::insert(const SSQLS&).
> You want the ability to separate policy from function because Query 
> can't hard-code details like the server's configured max packet size, 
> the min/max/avg object size, etc.
> Policy object suggestions:
> 1. MaxPacketPolicy: create it with a size parameter, and it will pack 
> as many records into the query as possible, up to that packet size 
> limit.  The difficulty here is that you probably have to build each 
> values() fragment separately, then concatenate them, because there's 
> no easier way to tell exactly how many bytes a given SSQLS holds in 
> SQL terms.  You'd be betting the extra memory overhead pays for the 
> savings on disk and network I/O.
> 2. SizeThresholdPolicy: like above, but it doesn't check the size of 
> the records before inserting them.  It just keeps a running count of 
> query size, and when it goes over, it sends the query.  This works 
> well when you give it a size on the order of the TCP window size, 
> which should be much lower than the MySQL packet size.  As long as the 
> max record size is smaller than the MySQL max packet less the size 
> threshold, you never overflow.  Relative to the max-packet policy, it 
> saves the cost of the extra data copy, but it loses the guaranteed 
> safety of a hard upper packet size limit.
> 3. RecordCountPolicy: N records per query, every time.  This just 
> takes us further down the safety vs. simplicity axis toward simplicity.
> This design lets you try multiple things, benchmarking each, and 
> deciding on one based on what you see in your particular situation.

Using MySQL++ with lots and lots of dataRick Gutleber13 Oct
  • Re: Using MySQL++ with lots and lots of dataAndrew Sayers14 Oct
  • Re: Using MySQL++ with lots and lots of dataWarren Young14 Oct
    • Re: Using MySQL++ with lots and lots of dataRick Gutleber14 Oct
      • Re: Using MySQL++ with lots and lots of dataWarren Young15 Oct
        • Re: Using MySQL++ with lots and lots of dataRick Gutleber15 Oct
          • Re: Using MySQL++ with lots and lots of dataWarren Young15 Oct
            • Re: Using MySQL++ with lots and lots of dataRick Gutleber15 Oct
              • Re: Using MySQL++ with lots and lots of dataWarren Young15 Oct