On Wed, May 07, 2008 at 01:28:55PM +0300, Michael Widenius wrote:
> >>>>> "Sanja" == Sanja Byelkin <sanja@stripped> writes:
> Sanja> Hi!
> Sanja> Short description of loghandler buffers
> Sanja> =======================================
> Sanja> Loghandler has circle of buffers (1-2MB each, 4-5 buffers).
> Sanja> Flush procedure get LSN (Log Serial Number, in our case just address in
> Sanja> the log) which should be flushed to the disk.
> Sanja> Now
> Sanja> ===
> Sanja> First thread which come to flush will lock serialisation mutex check
> Sanja> that requested LSN is not already flushed, then go through buffer list
> Sanja> and flush it one by one until the LSN become flushed. If it return to
> Sanja> the current buffer it will be forced to be closed to write (the last
> Sanja> page will be copied to the next buffer to be able continue it).
> Why copy the buffer ?
> Why not just write the buffer to disk and have pointer into the buffer
> on the first byte which is not yet flushed.
> (It would be good to avoid a memory copy of an (8K block)
1) becuase of CRC and sector protection. Without it it is possible to
avoid coping (and there is evem TODO in the code about it) but it is
speed optimisation which is not our goal now.
2) Also we should not forget about filling unused part of the page )it
is not really the problem but have to be mentioned to avoid forgetting
> What happens with buffers that want's to write a log entry to the last
> buffer while the flush is in progress? Do they have to wait until the
> whole flush is done or do they only have to wait if the buffer they
> want to use is in flush stage?
As soon as we started new buffer (at the beginning of the flush process)
it is opened for adding information into it while other thread will
flush other buffers.
> Sanja> When flussh is domne it will unlock serialisation mutex.
> Sanja> Plans
> Sanja> =====
> Sanja> First leader
> Sanja> ------------
> Sanja> First thread which come to flush (leader of this flush pass) detect
> Sanja> buffers which it should flush (beginning and ending / min and max) and
> Sanja> set it as goal of current pass.
> Is the 'goal' is a struct that is protected by it's own mutex ?
I though about loghandler lock for state of flush changing so goal will
not need its own protection because only one thread can change it, and
other threads will wait state change while this one will change the
> Sanja> Then it go through buffer list from min to max and flush them one by
> Sanja> one skipping already flushed buffers (see 'Flush pass' for details)
> Sanja> (Note: A buffer can has overlapping of its first page with the last
> Sanja> page of the previous buffer (they appears when we force current buffer
> Sanja> to closing). If both are in current plan then thread which flush 'the
> Sanja> previous buffer' will not write the last page at all because it is old
> Sanja> version of the page)
> Do we have a worklog that describes with in detail how the buffer
> management is done regarding the last buffer?
the "Log Handler Buffer Design" section.
> Sanja> When it reach last buffer it will:
> Sanja> - wait of finishing other thread which.
> What do you mean with the above?
- wait of finishing other thread which are flushing buffers.
> Sanja> - sync file
> Sanja> - update information about flushed data
> Sanja> - inform threads which are waiting about the pass end.
> Sanja> Other threads
> Sanja> -------------
> Sanja> Tread which come and see that process of flush is already started
> Sanja> - find buffers which should be flushed, if current pass will not satisfy
> Sanja> this thread it will check information about next pass and if it need
> Sanja> more then already registered there it will update information and set
> Sanja> itself as leader of the next pass (like first thread to which come to
> Sanja> flush).
> I assume that in some cases the thread will notice that everything it
> needs is already flushed and it can then return at once ?
> (Not a common case as the thread has just written a commit record, but
> still a possible case)
Yes it is bone even now and I am not going to remove it.
> Why do we need to calculate which buffers should be flushed?
> Isn't it enough to register the max LSN of all waiting threads and
> make the thread with max LSN the leader of next flush ?
The problem is that the address which flush process gets can not be real
LSN but "horizon" address. So we will need get real LSN of real
last buffer, the real buffer is much simplier to get (in most cases it
is the last buffer).
> Sanja> - then it will take part in flushing buffers (see 'Flush pass' for
> Sanja> details).
> Isn't this a common senario:
> - Thread calculates which buffers to flush
> - Wait for flush pass of leader to finnish
> - Wake up
> - Notice that everything to it's current lsn is already flushed and return
> If this is the case, we should in many cases be able to totally avoid
> the flush pass and taking any mutex for the different log buffers.
Above is checked at the begining. If thread see that everything flushed
it quit without waiting, it will wait and do nothing only if it flush
and all buffers are taken fro flush so it can't do something helpful and
will wait till the end of tflush process.
> Sanja> - after which it:
> Sanja> - return if all need buffers are flushed (not LSN which could be LSN
> Sanja> from future at the moment of flush call which means all current
> Sanja> buffers).
> Sanja> - become leader of the next pass (if it is still registered as leader)
> Sanja> - take part in the next flush pass
> When we flush the last buffer, we could increase the flush LSN to the
> current maximum LSN. This way we can release a lot of threads without
> having them do a flush pass
It is done.
> Sanja> Flush process status
> Sanja> --------------------
> Sanja> - FREE: no threads is flushing
> Sanja> - LEADER_DETECT: leader came but goal is not detected yet (other
> Sanja> threads just will wait)
> Sanja> - FLUSH_PASS: flushing buffer process is in progress
> Sanja> - FLUSH_FINISHING: all buffers are at flush we are waiting for the end
> Sanja> of the process
> Sanja> - FLUSH_END: flush finished leader is syncing and updating statistic
> Sanja> - NEW_LEADER: switching to the new leader registered for next pass
> Sanja> (if we need new pass) (maybe it will be merged with LEADER_DETECT).
> Assuming we have store the flush lsn as the goal, we may not need a
> LEADER_DETECT or NEW_LEADER phase.
We can't. we should detect which buffers will be flushed in any case
just to share work with other threads (if it is not the only buffer).
> Sanja> Statuses graph
> Sanja> --------------
> Sanja> +------------- NEW_LEADER <---------------------------+
> Sanja> | |
> Sanja> V |
> Sanja> FREE ---> LEADER_DETECT ---> FLUSH_PASS ---> FLUSH_FINISHING
> --> FLUSH_END
> Sanja> ^ |
> Sanja> | |
> Sanja> +-------------------------------------------------------------------+
> Sanja> Flush pass
> Sanja> ----------
> Sanja> - increase number of participant and start loop from min buffer
> Can we have more than one flushing thread active at once ?
yes. I decided that writing to file is mostly coping data in memory to
file buffer of OS, so meny threds in case if we have many CPUs can help.
> Sanja> - if buffer flushed skip to the next one if no then flush it
> Sanja> - when max buffer of current pass goual processed: decrease participant
> counter if status is not
> Sanja> FLUSH_FINISHING change it. If counter is 0 switch to FLUSH_END.
> Sanja> Concerns
> Sanja> --------
> Sanja> Switching states and pass through buffers can take a lot of time because
> Sanja> different mutexes acquire (but I think it is nothing comparing to IO
> Can you add what mutex you will need and when you will take and
> release them ?
According to my scatch we mostly need mutex for state switching where I
thought use loghandler lock.
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Mr. Oleksandr Byelkin <sanja@stripped>
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, Full-Time Developer
/_/ /_/\_, /___/\___\_\___/ Lugansk, Ukraine