List:Replication« Previous MessageNext Message »
From:周振兴 Date:September 14 2011 3:45am
Subject:Re: A feature patch for semi-sync
View as plain text  
Hi all

   Thanks, Mark. Here are more details and my test data.

2011/9/14 MARK CALLAGHAN <mdcallag@stripped>

> I think you should describe this in more detail for others to be
> interested. What you have done is very interesting, but also not
> trivial for us to understand.
>

In MySQL-5.5 the semi-sync code is something like this:
original semi-sync
    binlog_prepare (do nothing)
    innodb_prepare
    ...
    binlog_commit
    innobase_commit
    WAITING FOR THE SLAVE

Now rpl_semi_sync_master_wait_before_commit=1
    binlog_prepare (do nothing)
    innodb_prepare
    ...
    binlog_commit
    WAITING FOR THE SLAVE
    innobase_commit

And I have draw some graph to explain
how "rpl_semi_sync_master_wait_before_commit"  works:
  1. the original semi-sync replication:
http://www.flickr.com/photos/26825745@N06/6145624937/in/photostream
  2. rpl_semi_sync_master_wait_before_commit=1:
      http://www.flickr.com/photos/26825745@N06/6145626791/in/photostream

In the original semi-sync, when the master enter the "Waiting" step, even
the slave have not get the log, the thread in the master, still can read the
transaction modification. That's what we expect. We hope ONLY AFTER the
slave has got the log, the thread in the master can read the data.



> In official MySQL commit sequence is:
> 1) Innodb writes changes to transaction log and something to indicate
> transaction is prepared
> 2) InnoDB syncs transaction log (this fsync can be shared by several
> transactions)
> 3) InnoDB locks prepare_commit_mutex
> 4) writes/sync the binlog (this fsync cannot be shared as
> prepare_commit_mutex is locked)
> 5) write commit record to innodb transaction log
> 6) unlock prepare_commit_mutex
> 7) sync InnoDB transaction log (this fsync can be shared)
>
> I am not familiar with replication in MySQL 5.5 and the official
> version of semi-sync. I know the Google version. I will guess that
> your change occurs between steps 4 and 5 above. But if a wait is done
> there then prepare_commit_mutex is locked and nothing else can commit
> until a slave acks. So this might have a significant impact on commit
> throughput.
>
> Eventually official MySQL should have group commit. For now it is in
> MariaDB, Percona and the Facebook patch. The group commit
> implementations remove prepare_commit_mutex and with it remove the
> performance impact from your change might be less but I still think
> something should be done to pipeline or group acks from the slave.
>

I have not check how innodb group commit works (as i know after InnoDB
Plugin 1.0.4 Group commit
works<http://www.innodb.com/wp/products/innodb_plugin/plugin-performance/innodb-plugin-1-0-4-group-commit-test-sysbench/>
).
And i do not is this break it.  I will check the group commit later.

I have do some performance test about this patch:

How many update it can do:(1 4 16 thread )

1 4 16
ESR 185.8 600 1250
Normal 201.4 624 1351
Since innodb_thread_concurrency=16 We have not do more test.

http://www.flickr.com/photos/26825745@N06/6145659563/in/photostream

How many insert it can do:(1 4 16 thread )
1 4 16
ESR 343.9 970.97 1757.91
Normal 406.8 1065 2183.64

I use supersmack with our product data about 150GB. The test is an IO boud
test.And i have run every test 4 times and get the average value.


> 2011/9/13 周振兴 <orczhou@stripped>:
> > Hi all
> >
> > The patch for semi-sync has been commit to Launchpad.net:
> > lp:~orczhou/mysql-server/ESR<
> https://code.launchpad.net/~orczhou/mysql-server/ESR>
> > .
> >
> > A feature request on bug system has been report :
> > http://bugs.mysql.com/62174
> >
> > Semi-sync is cool. But in the semi-sync solution, there are "phantom
> > read": When InnoDB commit a transaction and no slave have accept the
> > binary log, at the time another "new thread" still can read the
> > data generated by this transaction. If database crash and gone(can't
> > startup anymore) at this right time, although we have ever read these
> > data, it's not exist in any slave and we think this transaction is
> > never happened.
> >
> > In our application architecture, if our "new thread" read
> > "phantom data", we will try to do some work about user's critical
> > information. So we must not read the "phantom data".
> >
> > So,we add a new variables for semi-sync
> > "rpl_semi_sync_master_wait_before_commit".Once you set this variables
> >  ***WILL NOT COMMIT***  ON, MySQL(InnoDB)  the transaction until at least
> > one
> > slave get the binary log. By default, it is OFF and semi-sync act as the
> > original way.
> >
> > Set this variable ON, make semi-sync act like this :
> >  "Once you ***can read*** the data from the master, at least one slave
> has
> > got the binary log"
> > as the original semi-sync act like this :
> >  "Once you ***get reponse*** from the master, at least on slave has got
> the
> > binary log".
> >
> > The patch for semi-sync has been commit to Launchpad.net:
> > lp:~orczhou/mysql-server/ESR<
> https://code.launchpad.net/~orczhou/mysql-server/ESR>
> > .
> >
> > A feature request on bug system has been report :
> > http://bugs.mysql.com/62174
> >
> > Anders.song has been reviewed the code.
> >
> > *Some question:*
> >
> > How do i make the branch from "Development" to "Experimental" or "Mature"
> ?
> >
> > When should i Propose for merging?
> >
> > Are there a "gatekeeper" for mysql-server? How it works ?
> >
> > *Test case*
> >
> > I have run all test-case of suite=rpl with
> > --mysqld=--loose-rpl_semi_sync_master_wait_before_commit=1.
> > All test-cases act like exactly as the original way, except the
> > rpl_semi_sync.test. What make rpl_semi_sync.test
> > different is :
> >    In the patch,any DDL statement(trans_commit_implicit) is an exception,
> > which will NOT wait the reply of slave.
> >    The reason is : the patch add a new HOOK in binlog_commit. When
> > binlog_commit return, we wait the slave reply.
> > So every before transaction commit, it will wait the slave reply.But, the
> > DDL statement will NEVER invoke binlog_commit.
> > And we think it's OK that only DDL is the special case.
> >
> > A new test-case is writing to verify this patch is always act as the
> expect
> > way.
> >
> > You can get all the details from the code
> > :lp:~orczhou/mysql-server/ESR<
> https://code.launchpad.net/~orczhou/mysql-server/ESR>
> > .
> >
> >
> >
> > --
> > 此致
> >    敬礼
> > -----------------------------------------------------
> > 周振兴  159-9004-0105 Taobao.com
> > http://orczhou.com
> >
>
>
>
> --
> Mark Callaghan
> mdcallag@stripped
>



-- 
此致
    敬礼
-----------------------------------------------------
周振兴  159-9004-0105 Taobao.com
http://orczhou.com

Thread
A feature patch for semi-sync周振兴14 Sep
  • Re: A feature patch for semi-syncMARK CALLAGHAN14 Sep
    • Re: A feature patch for semi-sync周振兴14 Sep
      • Re: A feature patch for semi-syncMARK CALLAGHAN15 Sep
        • Re: A feature patch for semi-sync周振兴15 Sep
        • Re: A feature patch for semi-syncKristian Nielsen15 Sep
          • Re: A feature patch for semi-sync周振兴23 Sep
            • Re: A feature patch for semi-syncMARK CALLAGHAN23 Sep
              • Re: A feature patch for semi-sync周振兴23 Sep
  • Re: A feature patch for semi-sync周振兴20 Sep
    • Re: A feature patch for semi-syncMats Kindahl20 Sep
      • Re: A feature patch for semi-sync周振兴21 Sep
        • Re: A feature patch for semi-syncMats Kindahl21 Sep