Warren, I think your suggestions are very reasonable. I have a number
of initial questions relating to coding style. I'm pretty OCD about my
own coding style, so if I'm contributing to someone else's code I want
to adhere to the style used as much as possible.
1. Do you envision the policy as an actual object (i.e., class)? I
could easily see the idea of a policy being applicable to other methods
and classes, so perhaps it's best to think in those terms.
2. If you're thinking object, would you want a whole hierarchy or just
some separate classes? I would naturally think of a class hierarchy
where everything is derived from a vanilla Policy class, but honestly I
don't see what purpose that would serve.
3. I would give the InsertPolicy two members, an enum to represent the
policy type (one of the 4 types you suggested), and an integer size
value. Is the right track?
4. The code seems to be organized with a single class in each file,
which is the way I work, too. Would you want Policy class(es) to go in
its (their) own file(s)?
5. Or maybe the policy should be a private struct defined only in the
Query class? If you believe the policy concept is likely to be limited
to this one implementation or a very small number, then perhaps we
should define it as part of the Query class and not pollute the
namespace with a bunch of stuff only used in one place?
The idea here is not "What do I do?" because that's straightforward, but
"How should I get started to fit it in with your vision of MySQL++ and
coding style?"
Or, would you prefer me to use my own judgement and show you when I have
something? ;-)
Rick
p.s. Now I finally understand why you never created the obvious and
intuitive insert( container ) template method... it would conflict with
the insert( SSQLS ) method.
Warren Young wrote:
> On Oct 13, 2008, at 3:22 PM, Rick Gutleber wrote:
>
>> Increasing the packet size limit
> I don't see how this would help. TCP's window size will be smaller
> than the MySQL packet size. Once that's filled, you can keep on
> pushing more data into the stack, and it will queue it for you (or
> block the sender, same thing), but it won't make the data go out any
> faster.
>
> MySQL has a packet size parameter to handle the case where a *single
> record* may be too big for a default packet. You increase it only to
> fix this problem, not to get better performance.
>
> Have you eliminated the possibility that disk I/O is the bottleneck?
> If you have less than a gigabit per second of data continually going
> between client and server, it could be disk I/O on either end holding
> you up, not packet overhead.
>
> It takes a real server -- as opposed to a desktop PC in a rack case --
> to continually fill a gigabit pipe from disk. This is doubly the case
> on the DB server, which has more than simple disk I/O overhead. The
> DB server's disk subsystem has to be fast enough to do all the DB
> stuff on top of sinking a gigabit per second of data.
>
>> Is there any kind of performance gain that can be made by issuing
>> bulk INSERT statements over inserting records individually
>
> Yes, up to the point that you can keep the TCP window full, or at
> least continually "sated". Beyond that, no.
>
>> can MySQL++ be used to gain the performance advantages of bulk INSERT
>> statements without having to do a whole bunch of special coding?
>
> You can use Query::insert(iter, iter) for this. You wrap it in a
> loop, passing a subranges of the vector on each iteration, covering
> the whole vector a chunk at a time.
>
>> I would love to contribute to creating a patch for such an
>> improvement if it makes sense.
>
>
> We accept your offer. :)
>
> A possible design:
>
> Add Query::insert(container, policy), on the pattern of
> Query::storein(). The policy parameter is a pointer to a policy object.
>
> If you pass 0 for the policy, it gives the same effect as
> Query::insert(con.begin(), con.end()): the whole container is expected
> to fit into a single MySQL packet. Can't make this a default
> parameter, else you create a possibility of ambiguity w.r.t.
> Query::insert(const SSQLS&).
>
> You want the ability to separate policy from function because Query
> can't hard-code details like the server's configured max packet size,
> the min/max/avg object size, etc.
>
> Policy object suggestions:
>
> 1. MaxPacketPolicy: create it with a size parameter, and it will pack
> as many records into the query as possible, up to that packet size
> limit. The difficulty here is that you probably have to build each
> values() fragment separately, then concatenate them, because there's
> no easier way to tell exactly how many bytes a given SSQLS holds in
> SQL terms. You'd be betting the extra memory overhead pays for the
> savings on disk and network I/O.
>
> 2. SizeThresholdPolicy: like above, but it doesn't check the size of
> the records before inserting them. It just keeps a running count of
> query size, and when it goes over, it sends the query. This works
> well when you give it a size on the order of the TCP window size,
> which should be much lower than the MySQL packet size. As long as the
> max record size is smaller than the MySQL max packet less the size
> threshold, you never overflow. Relative to the max-packet policy, it
> saves the cost of the extra data copy, but it loses the guaranteed
> safety of a hard upper packet size limit.
>
> 3. RecordCountPolicy: N records per query, every time. This just
> takes us further down the safety vs. simplicity axis toward simplicity.
>
> This design lets you try multiple things, benchmarking each, and
> deciding on one based on what you see in your particular situation.
>