Hi all
Thanks, Mark. Here are more details and my test data.
2011/9/14 MARK CALLAGHAN <mdcallag@stripped>
> I think you should describe this in more detail for others to be
> interested. What you have done is very interesting, but also not
> trivial for us to understand.
>
In MySQL-5.5 the semi-sync code is something like this:
original semi-sync
binlog_prepare (do nothing)
innodb_prepare
...
binlog_commit
innobase_commit
WAITING FOR THE SLAVE
Now rpl_semi_sync_master_wait_before_commit=1
binlog_prepare (do nothing)
innodb_prepare
...
binlog_commit
WAITING FOR THE SLAVE
innobase_commit
And I have draw some graph to explain
how "rpl_semi_sync_master_wait_before_commit" works:
1. the original semi-sync replication:
http://www.flickr.com/photos/26825745@N06/6145624937/in/photostream
2. rpl_semi_sync_master_wait_before_commit=1:
http://www.flickr.com/photos/26825745@N06/6145626791/in/photostream
In the original semi-sync, when the master enter the "Waiting" step, even
the slave have not get the log, the thread in the master, still can read the
transaction modification. That's what we expect. We hope ONLY AFTER the
slave has got the log, the thread in the master can read the data.
> In official MySQL commit sequence is:
> 1) Innodb writes changes to transaction log and something to indicate
> transaction is prepared
> 2) InnoDB syncs transaction log (this fsync can be shared by several
> transactions)
> 3) InnoDB locks prepare_commit_mutex
> 4) writes/sync the binlog (this fsync cannot be shared as
> prepare_commit_mutex is locked)
> 5) write commit record to innodb transaction log
> 6) unlock prepare_commit_mutex
> 7) sync InnoDB transaction log (this fsync can be shared)
>
> I am not familiar with replication in MySQL 5.5 and the official
> version of semi-sync. I know the Google version. I will guess that
> your change occurs between steps 4 and 5 above. But if a wait is done
> there then prepare_commit_mutex is locked and nothing else can commit
> until a slave acks. So this might have a significant impact on commit
> throughput.
>
> Eventually official MySQL should have group commit. For now it is in
> MariaDB, Percona and the Facebook patch. The group commit
> implementations remove prepare_commit_mutex and with it remove the
> performance impact from your change might be less but I still think
> something should be done to pipeline or group acks from the slave.
>
I have not check how innodb group commit works (as i know after InnoDB
Plugin 1.0.4 Group commit
works<http://www.innodb.com/wp/products/innodb_plugin/plugin-performance/innodb-plugin-1-0-4-group-commit-test-sysbench/>
).
And i do not is this break it. I will check the group commit later.
I have do some performance test about this patch:
How many update it can do:(1 4 16 thread )
1 4 16
ESR 185.8 600 1250
Normal 201.4 624 1351
Since innodb_thread_concurrency=16 We have not do more test.
http://www.flickr.com/photos/26825745@N06/6145659563/in/photostream
How many insert it can do:(1 4 16 thread )
1 4 16
ESR 343.9 970.97 1757.91
Normal 406.8 1065 2183.64
I use supersmack with our product data about 150GB. The test is an IO boud
test.And i have run every test 4 times and get the average value.
> 2011/9/13 周振兴 <orczhou@stripped>:
> > Hi all
> >
> > The patch for semi-sync has been commit to Launchpad.net:
> > lp:~orczhou/mysql-server/ESR<
> https://code.launchpad.net/~orczhou/mysql-server/ESR>
> > .
> >
> > A feature request on bug system has been report :
> > http://bugs.mysql.com/62174
> >
> > Semi-sync is cool. But in the semi-sync solution, there are "phantom
> > read": When InnoDB commit a transaction and no slave have accept the
> > binary log, at the time another "new thread" still can read the
> > data generated by this transaction. If database crash and gone(can't
> > startup anymore) at this right time, although we have ever read these
> > data, it's not exist in any slave and we think this transaction is
> > never happened.
> >
> > In our application architecture, if our "new thread" read
> > "phantom data", we will try to do some work about user's critical
> > information. So we must not read the "phantom data".
> >
> > So,we add a new variables for semi-sync
> > "rpl_semi_sync_master_wait_before_commit".Once you set this variables
> > ***WILL NOT COMMIT*** ON, MySQL(InnoDB) the transaction until at least
> > one
> > slave get the binary log. By default, it is OFF and semi-sync act as the
> > original way.
> >
> > Set this variable ON, make semi-sync act like this :
> > "Once you ***can read*** the data from the master, at least one slave
> has
> > got the binary log"
> > as the original semi-sync act like this :
> > "Once you ***get reponse*** from the master, at least on slave has got
> the
> > binary log".
> >
> > The patch for semi-sync has been commit to Launchpad.net:
> > lp:~orczhou/mysql-server/ESR<
> https://code.launchpad.net/~orczhou/mysql-server/ESR>
> > .
> >
> > A feature request on bug system has been report :
> > http://bugs.mysql.com/62174
> >
> > Anders.song has been reviewed the code.
> >
> > *Some question:*
> >
> > How do i make the branch from "Development" to "Experimental" or "Mature"
> ?
> >
> > When should i Propose for merging?
> >
> > Are there a "gatekeeper" for mysql-server? How it works ?
> >
> > *Test case*
> >
> > I have run all test-case of suite=rpl with
> > --mysqld=--loose-rpl_semi_sync_master_wait_before_commit=1.
> > All test-cases act like exactly as the original way, except the
> > rpl_semi_sync.test. What make rpl_semi_sync.test
> > different is :
> > In the patch,any DDL statement(trans_commit_implicit) is an exception,
> > which will NOT wait the reply of slave.
> > The reason is : the patch add a new HOOK in binlog_commit. When
> > binlog_commit return, we wait the slave reply.
> > So every before transaction commit, it will wait the slave reply.But, the
> > DDL statement will NEVER invoke binlog_commit.
> > And we think it's OK that only DDL is the special case.
> >
> > A new test-case is writing to verify this patch is always act as the
> expect
> > way.
> >
> > You can get all the details from the code
> > :lp:~orczhou/mysql-server/ESR<
> https://code.launchpad.net/~orczhou/mysql-server/ESR>
> > .
> >
> >
> >
> > --
> > 此致
> > 敬礼
> > -----------------------------------------------------
> > 周振兴 159-9004-0105 Taobao.com
> > http://orczhou.com
> >
>
>
>
> --
> Mark Callaghan
> mdcallag@stripped
>
--
此致
敬礼
-----------------------------------------------------
周振兴 159-9004-0105 Taobao.com
http://orczhou.com