2011/9/13 周振兴 <orczhou@stripped>:
> Hi all
> Thanks, Mark. Here are more details and my test data.
> 2011/9/14 MARK CALLAGHAN <mdcallag@stripped>
>>
>> I think you should describe this in more detail for others to be
>> interested. What you have done is very interesting, but also not
>> trivial for us to understand.
>
> In MySQL-5.5 the semi-sync code is something like this:
> original semi-sync
> binlog_prepare (do nothing)
> innodb_prepare
> ...
> binlog_commit
> innobase_commit
> WAITING FOR THE SLAVE
> Now rpl_semi_sync_master_wait_before_commit=1
> binlog_prepare (do nothing)
> innodb_prepare
> ...
> binlog_commit
> WAITING FOR THE SLAVE
> innobase_commit
Without looking at 5.5 code "WAITING FOR THE SLAVE" is done while that
connection holds prepare_commit_mutex so all commits are blocked until
the wait ends. From looking at your performance number I assume you
have tested this on a system with a real (5 milliseconds or greater)
fsync latency. Assuming that the binlog fsync takes 10ms and the
average wait time for a slave to ack is 1ms, then your change won't do
much to limit throughput. But on a system with HW RAID or SSD/flash
then fsync latency is much lower and your change will reduce
performance by much more. The overhead from this will be also much
larger on versions of MySQL that use group commit for InnoDB+binlog
(which is in MariaDB, Percona and the Facebook patch).
I am not trying to criticize your work. I think is great work and hope
you present this at one of the community conferences. But can you make
the overhead of this low for systems with a fast fsync? When fsync is
fast and the workload has few concurrent transactions there isn't much
that can be done to avoid the overhead. But for workloads with a lot
of concurrency and on a MySQL variant that supports group commit it
might be possible to minimize the overhead from this.
--
Mark Callaghan
mdcallag@stripped