List:Internals« Previous MessageNext Message »
From:Mats Kindahl Date:January 31 2013 9:12pm
Subject:Re: reducing fsyncs during handlerton->prepare and handlerton->commit
in 5.6
View as plain text  
On 01/31/2013 05:09 PM, Zardosht Kasheff wrote:
> Thanks a lot Kristian and Mats.
>
> I am learning that I know a lot less than I thought I knew. To help my
> understanding, I will focus in this thread on MySQL 5.6. I will focus
> on MariaDB in another thread.
>
> I want to make sure my understanding is correct. Is what I write below accurate?
>
> In MySQL 5.5, we have the following APIs:
>  - handlerton->prepare
>  - handlerton->commit
>  - handlerton->flush_logs.
> We need to fsync on prepare and commit. But what does flush_logs need
> to do? According to comments, flush_logs runs a checkpoint on the
> system, which is pretty expensive. Is this accurate?

The flush_logs() function is called before rotating the binary log and
when doing an explicit FLUSH LOGS. It give the storage engine a chance
to flush any in-memory buffers, so yes, it does a checkpoint and it is
expensive. However, it is just once for each binary log.

This means that each time a binary log is rotated, the system grinds to
a halt, which is not very nice. We have been discussing ways to avoid this.

>
> In MySQL 5.6, we have the same APIs, but the contract has changed. We
> still always fsync on prepare, that remains the same. For commit, if
> HA_IGNORE_DURABILITY is set, we should not fsync, otherwise we may
> have poor performance.

Not poorer than without group commits, but yes, if you sync with every
commit, you will have poor performance.

> If HA_IGNORE_DURABILITY is not set, then we
> must fsync on commit.

Correct. The HA_IGNORE_DURABILITY says that the server "handles the
durability".

> I do not know what flush_logs needs to do.
>
> My last question is the following:
>  - what should flush_logs do?
>  - what is the purpose/contract of flush_logs? Under what scenarios is
> it meant to be called?

The flush_logs should create a checkpoint, just as you said above. It is
called on binary log rotate and on explicit FLUSH LOGS (actually also on
ALTER TABLE, under some circumstances: see sql_table.cc).

>
> The comments in MySQL 5.5 and 5.6 imply that a checkpoint is run on
> the system. This is what we do as well. This sounds expensive, because
> IIUC, a checkpoint writes all dirty nodes to disk. But looking at the
> implementation, it seems that flush_logs only ensures that the redo
> log is synced up to the proper lsn, and if not, syncs it. Essentially,
> it just fsyncs the log.

I assume that you have been looking at InnoDB. The reason a checkpoint
is done (by flushing the log) is that the recovery procedure only looks
in the last binary log, which means that you might lose committed
transactions on a crash, which is not OK.

If you have prepared and committed some transaction but do not flush the
log on disk before a rotate, it might be that on recovery it is listed
as prepared (because the commit record was not written to disk) but the
recovery procedure will not find it in the binary log (because it was
not in the last one, it was in the preceding one) and hence it will be
rolled back.

Poof! Transaction gone.

/Matz

>
> Is this accurate? If so I think we need to modify our engine to not
> checkpoint and just fsync our recovery log.
>
> Thanks
> -Zardosht
>
> On Thu, Jan 31, 2013 at 4:41 AM, Kristian Nielsen
> <knielsen@stripped> wrote:
>> Mats Kindahl <mats.kindahl@stripped> writes:
>>
>>> In MySQL 5.6, there are no new APIs that you *have* to comply with.
>> But MySQL 5.6 serialises calls to the commit handlerton method, the next one
>> cannot start before the previous one completes. So if you did fsync() with
>> group commit before in commit, your group commit will no longer work in 5.6
>> and you will get a serious performance regression if you do not honour the
>> HA_IGNORE_DURABILITY flag. I consider that breaking the storage engine API, as
>> you see Mats and I disagree a bit on that point :-)
>>
>> Anyway, it should be easy to do for you. MySQL 5.6 sets HA_IGNORE_DURABILITY,
>> this has similar semantics to when MariaDB 5.3+ calls the commit_ordered()
>> method. So you can probably use the same code for both with a small amount of
>> #ifdef.
>>
>> Note that for MySQL 5.6 you need to implement also the flush_logs() method to
>> fsync() all prior commits durably to disk (if you did not already implement
>> it). What happens is basically that in MySQL crash recovery looks only at the
>> last binlog file written. So it calls flush_logs() before creating a new
>> binlog, and storage engine must ensure that all commits become durable at that
>> point. Otherwise commits may be lost if a crash happens just after binlog
>> rotation.
>>
>> This is actually the _only_ reason that fsync() was ever needed in commit, to
>> ensure that it is done when binlog is rotated. So it is rather silly that we
>> have done it for _every_ commit for so long. Anyway, it will be fixed now.
>>
>>  - Kristian.

-- 
Senior Principal Software Developer
Oracle, MySQL Department

Thread
reducing fsyncs during handlerton->prepare and handlerton->commit in 5.6Zardosht Kasheff30 Jan
  • Re: reducing fsyncs during handlerton->prepare and handlerton->commit in 5.6Kristian Nielsen31 Jan
  • Re: reducing fsyncs during handlerton->prepare and handlerton->commitin 5.6Mats Kindahl31 Jan
    • Re: reducing fsyncs during handlerton->prepare and handlerton->commit in 5.6Kristian Nielsen31 Jan
      • Re: reducing fsyncs during handlerton->prepare and handlerton->commitin 5.6Zardosht Kasheff31 Jan
        • Re: reducing fsyncs during handlerton->prepare and handlerton->commitin 5.6Mats Kindahl31 Jan
          • Re: reducing fsyncs during handlerton->prepare and handlerton->commitin 5.6Zardosht Kasheff31 Jan
            • Re: reducing fsyncs during handlerton->prepare and handlerton->commitin 5.6Mats Kindahl1 Feb
              • Re: reducing fsyncs during handlerton->prepare and handlerton->commitin 5.6Zardosht Kasheff1 Feb
                • Re: reducing fsyncs during handlerton->prepare and handlerton->commitin 5.6Mats Kindahl1 Feb
      • Re: reducing fsyncs during handlerton->prepare and handlerton->commitin 5.6Mats Kindahl3 Feb