From: Mats Kindahl Date: January 31 2013 8:39am Subject: Re: reducing fsyncs during handlerton->prepare and handlerton->commit in 5.6 List-Archive: http://lists.mysql.com/internals/38707 Message-Id: <510A2DB5.5040107@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On 01/30/2013 10:43 PM, Zardosht Kasheff wrote: > Hello all, > > As I understand it, for transactional storage engines to be in sync > with the binary log after recovery, the storage engine must support > two phase commit (aka XA). In MySQL 5.5 and MariaDB 5.5, the engine > must fsync its log when a transaction prepares and when a transaction > commits. So, for each transaction, there are three fsyncs, one for > prepare, one for the binary log, and one for commit. > > I also understand that this requirement has changed in MySQL 5.6 and > MariaDB 10.0. With those releases, there is a way for storage engines > to reduce their fsyncs. I also believe that MariaDB 5.5 still requires > all of these fsyncs. > > My questions are: > - in MySQL 5.6 and MariaDB 10.0, what are the fsyncing requirements > for storage engines during prepare and commit? In MySQL 5.6, the storage engine have to sync on prepare (or make the state durable some other way). On commit, the storage engine can either sync or not. If the storage engine decides to not sync and there is a crash, crash recovery will commit the prepared transaction if it was written to the binary log and roll it back otherwise. > - are there new APIs that the storage engine must comply with in > order to get these benefits? In MySQL 5.6, there are no new APIs that you *have* to comply with. However, the procedure set the thd->durability_properties flag to HA_IGNORE_DURABILITY if you can safely ignore the durability requirements for the commit and leave the committing and/or aborting of the transactions to the recovery procedure. > - in MySQL 5.6, I see new handlerton methods commit_low and > prepare_low. What do these APIs do? What is their contract? Are you referring to the ha_{commit,prepare}_low() functions? These are used to do the actual commit or rollback call into all engines participating in the transaction. The commits on the higher level (e.g., ha_commit_trans) call into the transaction coordinators, and the binary log batch writes to the binary log, so it has to batch these low-level commits as well. These functions are intended to be used by anybody implementing a transaction coordinator (the binary log is one example), but should not be necessary for any storage engine writer to use. Just my few cents, Mats Kindahl > > Thanks > -Zardosht > -- Senior Principal Software Developer Oracle, MySQL Department