List:Maria Storage Engine« Previous MessageNext Message »
From:Michael Widenius Date:May 8 2008 5:33pm
Subject:Re: Plans to make transaction log flush unserialised
View as plain text  

>>>>> "Sanja" == Sanja Byelkin <sanja@stripped> writes:


Sanja> First thread which come to flush will lock serialisation mutex check
Sanja> that requested LSN is not already flushed, then go through buffer list
Sanja> and flush it one by one until the LSN become flushed. If it return to
Sanja> the current buffer it will be forced to be closed to write (the last
Sanja> page will be copied to the next buffer to be able continue it). 
>> Why copy the buffer ?
>> Why not just write the buffer to disk and have pointer into the buffer
>> on the first byte which is not yet flushed.
>> (It would be good to avoid a memory copy of an (8K block)

Sanja> 1) becuase of CRC and sector protection. Without it it is possible to
Sanja> avoid coping (and there is evem TODO in the code about it) but it is
Sanja> speed optimisation which is not our goal now.

ok, as long as there is a TODO item for it and it's not easy to change
to this method now.

Sanja> 2) Also we should not forget about filling unused part of the page )it
Sanja> is not really the problem but have to be mentioned to avoid forgetting
Sanja> about it)

>> What happens with buffers that want's to write a log entry to the last
>> buffer while the flush is in progress?  Do they have to wait until the
>> whole flush is done or do they only have to wait if the buffer they
>> want to use is in flush stage?

Sanja> As soon as we started new buffer (at the beginning of the flush process)
Sanja> it is opened for adding information into it while other thread will
Sanja> flush other buffers.



Sanja> First leader
Sanja> ------------
Sanja> First thread which come to flush (leader of this flush pass) detect
Sanja> buffers which it should flush (beginning and ending / min and max) and
Sanja> set it as goal of current pass.
>> Is the 'goal' is a struct that is protected by it's own mutex ?

Sanja> I though about loghandler lock for state of flush changing so goal will
Sanja> not need its own protection because only one thread can change it, and
Sanja> other threads will wait state change while this one will change the
Sanja> goal.

If we hold the mutex for the whole duration of the flush, there is no
way for the waiting threads to register which buffers/LSN to flush to.

Wouldn't it be good for all threads to be able to register their LSN,
even while one threads is working on flush, so that when we are
finished flushing we know who will be the next leader and also the
max LSN that we need to flush?


Sanja> the "Log Handler Buffer Design" section.
Sanja> When it reach last buffer it will:
Sanja> - wait of finishing other thread which.
>> What do you mean with the above?

Sanja> - wait of finishing other thread which are flushing buffers.

I assume you mean 'wait for thread, that are flushing buffers, to finnish ?"


>> Why do we need to calculate which buffers should be flushed?
>> Isn't it enough to register the max LSN of all waiting threads and
>> make the thread with max LSN the leader of next flush ?

Sanja> The problem is that the address which flush process gets can not be real
Sanja> LSN but "horizon" address. So we will need get real LSN of real
Sanja> last buffer, the real buffer is much simplier to get (in most cases it
Sanja> is the last buffer).

If you have a long running transactions that is doing a lot of work or
another concurrent transaction inserting blobs then the common case
will be that it's not the last buffer you need to flush.

Don't we flush up to the end-LSN (ie, last byte) of the commit record?
AS each buffer must have an active disk address, it should be trivial to
detect if the LSN we want to flush is part of the current buffer. If
that's the case we can just flush the whole buffer (including any data
after the end-LSN if such exists).

Sanja> - then it will take part in flushing buffers (see 'Flush pass' for
Sanja> details).
>> Isn't this a common senario:
>> - Thread calculates which buffers to flush
>> - Wait for flush pass of leader to finnish
>> - Wake up
>> - Notice that everything to it's current lsn is already flushed and return
>> If this is the case, we should in many cases be able to totally avoid
>> the flush pass and taking any mutex for the different log buffers.

Sanja> Above is checked at the begining. If thread see that everything flushed
Sanja> it quit without waiting, it will wait and do nothing only if it flush
Sanja> and all buffers are taken fro flush so it can't do something helpful and
Sanja> will wait till the end of tflush process.
Sanja> - after which it:
Sanja> - return if all need buffers are flushed (not LSN which could be LSN
Sanja> from future at the moment of flush call which means all current
Sanja> buffers).

In which senario do we have a 'not LSN' ?
Don't we in all cases, except for checkpoint, flush until the commit
record is on disk ?

Sanja> - become leader of the next pass (if it is still registered as leader)
Sanja> - take part in the next flush pass

How do you get registered as leader if the 'goal struct' is locked
during the flush process?

>> When we flush the last buffer, we could increase the flush LSN to the
>> current maximum LSN.  This way we can release a lot of threads without
>> having them do a flush pass

Sanja> It is done. 


Sanja> Flush process status
Sanja> --------------------
Sanja> - FREE: no threads is flushing
Sanja> - LEADER_DETECT: leader came but goal is not detected yet (other
Sanja> threads just will wait)
Sanja> - FLUSH_PASS: flushing buffer process is in progress
Sanja> - FLUSH_FINISHING: all buffers are at flush we are waiting for the end
Sanja> of the process
Sanja> - FLUSH_END: flush finished leader is syncing and updating statistic
Sanja> - NEW_LEADER: switching to the new leader registered for next pass
Sanja> (if we need new pass) (maybe it will be merged with LEADER_DETECT).
>> Assuming we have store the flush lsn as the goal, we may not need a

Sanja> We can't. we should detect which buffers will be flushed in any case
Sanja> just to share work with other threads (if it is not the only buffer).

What work are done at the same time ?
Can there be more than one thread doing a flush of some buffers at the
same time?

Sanja> Statuses graph
Sanja> --------------
Sanja> +------------- NEW_LEADER <---------------------------+
Sanja> |                                                     |
Sanja> V                                                     |
Sanja> ^                                                                   |
Sanja> |                                                                   |
Sanja> +-------------------------------------------------------------------+
Sanja> Flush pass
Sanja> ----------
Sanja> - increase number of participant and start loop from min buffer
>> Can we have more than one flushing thread active at once ?

Sanja> yes. I decided that writing to file is mostly coping data in memory to
Sanja> file buffer of OS, so meny threds in case if we have many CPUs can help.
ok, so there can be many flush threads at the same time.
Are they all defined as leaders ?
Who will do the final sync ?

Sanja> - if buffer flushed skip to the next one if no then flush it
Sanja> - when max buffer of current pass goual processed:  decrease participant counter
> if status is not
Sanja> FLUSH_FINISHING change it. If counter is 0 switch to FLUSH_END.

Is it the thread that does 'flush end' that does the sync ?

What happens in this senario ?

Thread 1 want to flush to LSN 1000
Thread 2 want to flush to LSN 2000
Thread 3 want to flush to LSN 3000

Assume thread 2 comes in first and gets to be leader and starts
flushing buffers.
Then thread 3 starts to flush.
Then thread 1 starts to flush

Thread 1 will notice that thread 2 has already flushed the buffer it
needs. I assume 1 will now wait for the sync of the log.


When thread 2 is finished, will it sync and return or wait for thread
3 to finish and do a sync?


Plans to make transaction log flush unserialisedSanja Byelkin11 Apr
  • Re: Plans to make transaction log flush unserialisedOleksandr \"Sanja\" Byelkin17 Apr
  • re: Plans to make transaction log flush unserialisedMichael Widenius7 May
    • Re: Plans to make transaction log flush unserialisedSanja Byelkin8 May
      • Re: Plans to make transaction log flush unserialisedMichael Widenius8 May
        • Re: Plans to make transaction log flush unserialisedSanja Byelkin8 May