You're absolutely right!
Indeed, a storage engine is expected to sync both on prepare and on
commit. You understanding is correct.
But we're going to weaken this requirement in MariaDB (not surprisingly
:). We've discussed the solution just about a month ago, and the
corresponding task is "In Progress". I'll check what's exactly going on
when I get back home (it would've been nice seeing you, I'm sorry you
couldn't make it to Santa Clara this year).
On Apr 14, Zardosht Kasheff wrote:
> Hello all,
> Storage engines that support two-phase commit implement the functions
> handlerton->prepare and handlerton->commit. Clearly, a storage engine
> must fsync its own log after handlerton->prepare so that if we crash,
> it may report the prepared transaction to MySQL via
> My question is this: Do we need to fsync our log after
> handlerton->commit, or can we somehow be guaranteed that if we do not
> fsync, upon a crash, MySQL will have enough information to call
> handlerton->commit_by_xid on what is a prepared transaction in our
> storage engine?
> What I do not understand is what needs to happen after
> handlerton->commit. Ideally, we would like to not have to fsync after
> handlerton->commit so that we save an fsync. However, looking at code,
> it seems that an fsync is necessary, at least in the case where there
> is no binary log. In ha_commit_trans, I see:
> error=ha_commit_one_phase(thd, all) ? (cookie ? 2 : 1) : 0;
> DBUG_EXECUTE_IF("crash_commit_before_unlog", DBUG_SUICIDE(););
> if (cookie)
> if(tc_log->unlog(cookie, xid))
> If we crash after tc_log->unlog without having our log fsynced, then
> we do not properly recover on a crash.
> So, my questions are:
> - Is the same true for when a binary log exists? Do we need to fsync
> our log on commit?
> - Is my understanding of the non-binary log case correct in that we
> need to fsync our log on commit?