List:Maria Storage Engine« Previous MessageNext Message »
From:Michael Widenius Date:May 7 2008 10:28am
Subject:re: Plans to make transaction log flush unserialised
View as plain text  

>>>>> "Sanja" == Sanja Byelkin <sanja@stripped> writes:

Sanja> Hi!
Sanja> Short description of loghandler buffers
Sanja> =======================================

Sanja> Loghandler has circle of buffers (1-2MB each, 4-5 buffers).

Sanja> Flush procedure get LSN (Log Serial Number, in our case just address in
Sanja> the log) which should be flushed to the disk. 

Sanja> Now
Sanja> ===

Sanja> First thread which come to flush will lock serialisation mutex check
Sanja> that requested LSN is not already flushed, then go through buffer list
Sanja> and flush it one by one until the LSN become flushed. If it return to
Sanja> the current buffer it will be forced to be closed to write (the last
Sanja> page will be copied to the next buffer to be able continue it). 

Why copy the buffer ?
Why not just write the buffer to disk and have pointer into the buffer
on the first byte which is not yet flushed.
(It would be good to avoid a memory copy of an (8K block)

What happens with buffers that want's to write a log entry to the last
buffer while the flush is in progress?  Do they have to wait until the
whole flush is done or do they only have to wait if the buffer they
want to use is in flush stage?

Sanja> When flussh is domne it will unlock serialisation mutex.

Sanja> Plans
Sanja> =====

Sanja> First leader
Sanja> ------------

Sanja> First thread which come to flush (leader of this flush pass) detect
Sanja> buffers which it should flush (beginning and ending / min and max) and
Sanja> set it as goal of current pass.

Is the 'goal' is a struct that is protected by it's own mutex ?

Sanja> Then it go through buffer list from min to max  and flush them one by
Sanja> one skipping already flushed buffers (see 'Flush pass' for details)
Sanja>  (Note: A buffer can has overlapping of its first page with the last
Sanja>  page of the previous buffer (they appears when we force current buffer
Sanja>  to closing). If both are in current plan then thread which flush 'the
Sanja>  previous buffer' will not write the last page at all because it is old
Sanja>  version of the page)

Do we have a worklog that describes with in detail how the buffer
management is done regarding the last buffer?

Sanja> When it reach last buffer it will:
Sanja>  - wait of finishing other thread which.

What do you mean with the above?

Sanja>  - sync file
Sanja>  - update information about flushed data
Sanja>  - inform threads which are waiting about the pass end.

Sanja> Other threads
Sanja> -------------

Sanja> Tread which come and see that process of flush is already started
Sanja> - find buffers which should be flushed, if current pass will not satisfy
Sanja>   this thread it will check information about next pass and if it need
Sanja>   more then already registered there it  will update information and set
Sanja>   itself as leader of the next pass (like first thread to which come to
Sanja>   flush).

I assume that in some cases the thread will notice that everything it
needs is already flushed and it can then return at once ?
(Not a common case as the thread has just written a commit record, but
still a possible case)

Why do we need to calculate which buffers should be flushed?
Isn't it enough to register the max LSN of all waiting threads and
make the thread with max LSN the leader of next flush ?

Sanja> - then it will take part in flushing buffers (see 'Flush pass' for
Sanja>   details).

Isn't this a common senario:

- Thread calculates which buffers to flush
- Wait for flush pass of leader to finnish
- Wake up
- Notice that everything to it's current lsn is already flushed and return

If this is the case, we should in many cases be able to totally avoid
the flush pass and taking any mutex for the different log buffers.

Sanja> - after which it:
Sanja>   - return if all need buffers are flushed (not LSN which could be LSN
Sanja>     from future at the moment of flush call which means all current
Sanja>     buffers).
Sanja>   - become leader of the next pass (if it is still registered as leader)
Sanja>   - take part in the next flush pass

When we flush the last buffer, we could increase the flush LSN to the
current maximum LSN.  This way we can release a lot of threads without
having them do a flush pass

Sanja> Flush process status
Sanja> --------------------

Sanja>  - FREE: no threads is flushing
Sanja>  - LEADER_DETECT: leader came but goal is not detected yet (other
Sanja>    threads just will wait)
Sanja>  - FLUSH_PASS: flushing buffer process is in progress
Sanja>  - FLUSH_FINISHING: all buffers are at flush we are waiting for the end
Sanja>    of the process
Sanja>  - FLUSH_END: flush finished leader is syncing and updating statistic
Sanja>  - NEW_LEADER: switching to the new leader registered for next pass
Sanja>    (if we need new pass) (maybe it will be merged with LEADER_DETECT).

Assuming we have store the flush lsn as the goal, we may not need a

Sanja> Statuses graph
Sanja> --------------

Sanja>                +------------- NEW_LEADER <---------------------------+
Sanja>                |                                                     |
Sanja>                V                                                     |
Sanja>  ^                                                                   |
Sanja>  |                                                                   |
Sanja>  +-------------------------------------------------------------------+

Sanja> Flush pass
Sanja> ----------

Sanja> - increase number of participant and start loop from min buffer

Can we have more than one flushing thread active at once ?

Sanja>   - if buffer flushed skip to the next one if no then flush it
Sanja> - when max buffer of current pass goual processed:  decrease participant counter
> if status is not
Sanja>   FLUSH_FINISHING change it. If counter is 0 switch to FLUSH_END.

Sanja> Concerns
Sanja> --------

Sanja> Switching states and pass through buffers can take a lot of time because
Sanja> different mutexes acquire (but I think it is nothing comparing to IO time).

Can you add what mutex you will need and when you will take and
release them ?

In general things looks fine. Just want to understand the details to
know if my suggestions makes any sense.

Plans to make transaction log flush unserialisedSanja Byelkin11 Apr
  • Re: Plans to make transaction log flush unserialisedOleksandr \"Sanja\" Byelkin17 Apr
  • re: Plans to make transaction log flush unserialisedMichael Widenius7 May
    • Re: Plans to make transaction log flush unserialisedSanja Byelkin8 May
      • Re: Plans to make transaction log flush unserialisedMichael Widenius8 May
        • Re: Plans to make transaction log flush unserialisedSanja Byelkin8 May