On Oct 13, 2008, at 3:22 PM, Rick Gutleber wrote:
> Increasing the packet size limit
I don't see how this would help. TCP's window size will be smaller
than the MySQL packet size. Once that's filled, you can keep on
pushing more data into the stack, and it will queue it for you (or
block the sender, same thing), but it won't make the data go out any
faster.
MySQL has a packet size parameter to handle the case where a *single
record* may be too big for a default packet. You increase it only to
fix this problem, not to get better performance.
Have you eliminated the possibility that disk I/O is the bottleneck?
If you have less than a gigabit per second of data continually going
between client and server, it could be disk I/O on either end holding
you up, not packet overhead.
It takes a real server -- as opposed to a desktop PC in a rack case --
to continually fill a gigabit pipe from disk. This is doubly the case
on the DB server, which has more than simple disk I/O overhead. The
DB server's disk subsystem has to be fast enough to do all the DB
stuff on top of sinking a gigabit per second of data.
> Is there any kind of performance gain that can be made by issuing
> bulk INSERT statements over inserting records individually
Yes, up to the point that you can keep the TCP window full, or at
least continually "sated". Beyond that, no.
> can MySQL++ be used to gain the performance advantages of bulk
> INSERT statements without having to do a whole bunch of special
> coding?
You can use Query::insert(iter, iter) for this. You wrap it in a
loop, passing a subranges of the vector on each iteration, covering
the whole vector a chunk at a time.
> I would love to contribute to creating a patch for such an
> improvement if it makes sense.
We accept your offer. :)
A possible design:
Add Query::insert(container, policy), on the pattern of
Query::storein(). The policy parameter is a pointer to a policy object.
If you pass 0 for the policy, it gives the same effect as
Query::insert(con.begin(), con.end()): the whole container is expected
to fit into a single MySQL packet. Can't make this a default
parameter, else you create a possibility of ambiguity w.r.t.
Query::insert(const SSQLS&).
You want the ability to separate policy from function because Query
can't hard-code details like the server's configured max packet size,
the min/max/avg object size, etc.
Policy object suggestions:
1. MaxPacketPolicy: create it with a size parameter, and it will pack
as many records into the query as possible, up to that packet size
limit. The difficulty here is that you probably have to build each
values() fragment separately, then concatenate them, because there's
no easier way to tell exactly how many bytes a given SSQLS holds in
SQL terms. You'd be betting the extra memory overhead pays for the
savings on disk and network I/O.
2. SizeThresholdPolicy: like above, but it doesn't check the size of
the records before inserting them. It just keeps a running count of
query size, and when it goes over, it sends the query. This works
well when you give it a size on the order of the TCP window size,
which should be much lower than the MySQL packet size. As long as the
max record size is smaller than the MySQL max packet less the size
threshold, you never overflow. Relative to the max-packet policy, it
saves the cost of the extra data copy, but it loses the guaranteed
safety of a hard upper packet size limit.
3. RecordCountPolicy: N records per query, every time. This just
takes us further down the safety vs. simplicity axis toward simplicity.
This design lets you try multiple things, benchmarking each, and
deciding on one based on what you see in your particular situation.