On Thu, May 08, 2008 at 08:33:46PM +0300, Michael Widenius wrote:
> Sanja> First leader
> Sanja> ------------
> Sanja> First thread which come to flush (leader of this flush pass) detect
> Sanja> buffers which it should flush (beginning and ending / min and max) and
> Sanja> set it as goal of current pass.
> >> Is the 'goal' is a struct that is protected by it's own mutex ?
> Sanja> I though about loghandler lock for state of flush changing so goal will
> Sanja> not need its own protection because only one thread can change it, and
> Sanja> other threads will wait state change while this one will change the
> Sanja> goal.
> If we hold the mutex for the whole duration of the flush, there is no
> way for the waiting threads to register which buffers/LSN to flush to.
> Wouldn't it be good for all threads to be able to register their LSN,
> even while one threads is working on flush, so that when we are
> finished flushing we know who will be the next leader and also the
> max LSN that we need to flush?
Only for changing/checking state (see graph of states). It is determinated who
will change the state and why, what other threds will do if come in
certain state (see other mail for functions scatch)
> Sanja> the "Log Handler Buffer Design" section.
> Sanja> When it reach last buffer it will:
> Sanja> - wait of finishing other thread which.
> >> What do you mean with the above?
> Sanja> - wait of finishing other thread which are flushing buffers.
> I assume you mean 'wait for thread, that are flushing buffers, to finnish ?"
yes, but only in the range of buffer chosen for flush during current
pass (also see note about group commit at the end of this mail).
> >> Why do we need to calculate which buffers should be flushed?
> >> Isn't it enough to register the max LSN of all waiting threads and
> >> make the thread with max LSN the leader of next flush ?
> Sanja> The problem is that the address which flush process gets can not be real
> Sanja> LSN but "horizon" address. So we will need get real LSN of real
> Sanja> last buffer, the real buffer is much simplier to get (in most cases it
> Sanja> is the last buffer).
> If you have a long running transactions that is doing a lot of work or
> another concurrent transaction inserting blobs then the common case
> will be that it's not the last buffer you need to flush.
> Don't we flush up to the end-LSN (ie, last byte) of the commit record?
> AS each buffer must have an active disk address, it should be trivial to
> detect if the LSN we want to flush is part of the current buffer. If
> that's the case we can just flush the whole buffer (including any data
> after the end-LSN if such exists).
Hmmm... I forgot mentioned "evrything is flushed" flag whe have. It
will work for such cases if we have all buffer flushed we set it and
reset only when new LSN is generated. All requests to flush if the flug
set just ignored (because everything is flushed :)
> Sanja> - then it will take part in flushing buffers (see 'Flush pass' for
> Sanja> details).
> >> Isn't this a common senario:
> >> - Thread calculates which buffers to flush
> >> - Wait for flush pass of leader to finnish
> >> - Wake up
> >> - Notice that everything to it's current lsn is already flushed and return
> >> If this is the case, we should in many cases be able to totally avoid
> >> the flush pass and taking any mutex for the different log buffers.
> Sanja> Above is checked at the begining. If thread see that everything flushed
> Sanja> it quit without waiting, it will wait and do nothing only if it flush
> Sanja> and all buffers are taken fro flush so it can't do something helpful and
> Sanja> will wait till the end of tflush process.
> Sanja> - after which it:
> Sanja> - return if all need buffers are flushed (not LSN which could be LSN
> Sanja> from future at the moment of flush call which means all current
> Sanja> buffers).
> In which senario do we have a 'not LSN' ?
> Don't we in all cases, except for checkpoint, flush until the commit
> record is on disk ?
Checkpoint request all log to flush, just by sending current horizon as LSN.
> Sanja> - become leader of the next pass (if it is still registered as leader)
> Sanja> - take part in the next flush pass
> How do you get registered as leader if the 'goal struct' is locked
> during the flush process?
by do not writing there by other threads who are not leaders. (other
can write goal for next flush pass and it will be protected by its own
mutex). Current goal can be changed only by leader and only at the
begining of the flush pass (which do not close the last buffer for
edding data until it will be reached for real flush (which can be not
good in case of big amount info copping in the buffer so maybe better
start new buffer and only then start flush pass (I am not sure))).
> Sanja> Flush process status
> Sanja> --------------------
> Sanja> - FREE: no threads is flushing
> Sanja> - LEADER_DETECT: leader came but goal is not detected yet (other
> Sanja> threads just will wait)
> Sanja> - FLUSH_PASS: flushing buffer process is in progress
> Sanja> - FLUSH_FINISHING: all buffers are at flush we are waiting for the end
> Sanja> of the process
> Sanja> - FLUSH_END: flush finished leader is syncing and updating statistic
> Sanja> - NEW_LEADER: switching to the new leader registered for next pass
> Sanja> (if we need new pass) (maybe it will be merged with LEADER_DETECT).
> >> Assuming we have store the flush lsn as the goal, we may not need a
> >> LEADER_DETECT or NEW_LEADER phase.
> Sanja> We can't. we should detect which buffers will be flushed in any case
> Sanja> just to share work with other threads (if it is not the only buffer).
> What work are done at the same time ?
> Can there be more than one thread doing a flush of some buffers at the
> same time?
Yes. (cause mentioned below in my previous mail)
> Sanja> Statuses graph
> Sanja> --------------
> Sanja> +------------- NEW_LEADER <---------------------------+
> Sanja> | |
> Sanja> V |
> Sanja> FREE ---> LEADER_DETECT ---> FLUSH_PASS ---> FLUSH_FINISHING
> --> FLUSH_END
> Sanja> ^ |
> Sanja> | |
> Sanja> +-------------------------------------------------------------------+
> Sanja> Flush pass
> Sanja> ----------
> Sanja> - increase number of participant and start loop from min buffer
> >> Can we have more than one flushing thread active at once ?
> Sanja> yes. I decided that writing to file is mostly coping data in memory to
> Sanja> file buffer of OS, so meny threds in case if we have many CPUs can help.
> ok, so there can be many flush threads at the same time.
> Are they all defined as leaders ?
> Who will do the final sync ?
The leader will sync as soon as all threds finish with buffers which
should be flushed.
> Sanja> - if buffer flushed skip to the next one if no then flush it
> Sanja> - when max buffer of current pass goual processed: decrease participant
> counter if status is not
> Sanja> FLUSH_FINISHING change it. If counter is 0 switch to FLUSH_END.
> Is it the thread that does 'flush end' that does the sync ?
The leader sync and update statistic (like last LSN which is flushed).
> What happens in this senario ?
> Thread 1 want to flush to LSN 1000
> Thread 2 want to flush to LSN 2000
> Thread 3 want to flush to LSN 3000
> Assume thread 2 comes in first and gets to be leader and starts
> flushing buffers.
> Then thread 3 starts to flush.
> Then thread 1 starts to flush
> Thread 1 will notice that thread 2 has already flushed the buffer it
> needs. I assume 1 will now wait for the sync of the log.
yes. All participants will wait for flush pass end (because we can't say
that it is flushed untill we did sync).
> When thread 2 is finished, will it sync and return or wait for thread
> 3 to finish and do a sync?
thread 3 (if LSN 3000 is in other buffer then LSN 2000) will help with
reaching this sync goal and set its goal as next pass goal.
In case of group commit the thread 3 will be able to change current flush
goal if fit in time frame of group commit but in any case flushes will
be made pass by pass duging each pass some group of buffers will be
folushed and synced and then next pass started.
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Mr. Oleksandr Byelkin <sanja@stripped>
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, Full-Time Developer
/_/ /_/\_, /___/\___\_\___/ Lugansk, Ukraine