From: Michael Widenius Date: May 8 2008 5:33pm Subject: Re: Plans to make transaction log flush unserialised List-Archive: http://lists.mysql.com/maria/31 Message-Id: <18467.14714.562038.147459@narttu.mysql.fi> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi! >>>>> "Sanja" == Sanja Byelkin writes: Sanja> First thread which come to flush will lock serialisation mutex check Sanja> that requested LSN is not already flushed, then go through buffer list Sanja> and flush it one by one until the LSN become flushed. If it return to Sanja> the current buffer it will be forced to be closed to write (the last Sanja> page will be copied to the next buffer to be able continue it). >> >> Why copy the buffer ? >> Why not just write the buffer to disk and have pointer into the buffer >> on the first byte which is not yet flushed. >> (It would be good to avoid a memory copy of an (8K block) Sanja> 1) becuase of CRC and sector protection. Without it it is possible to Sanja> avoid coping (and there is evem TODO in the code about it) but it is Sanja> speed optimisation which is not our goal now. ok, as long as there is a TODO item for it and it's not easy to change to this method now. Sanja> 2) Also we should not forget about filling unused part of the page )it Sanja> is not really the problem but have to be mentioned to avoid forgetting Sanja> about it) >> What happens with buffers that want's to write a log entry to the last >> buffer while the flush is in progress? Do they have to wait until the >> whole flush is done or do they only have to wait if the buffer they >> want to use is in flush stage? Sanja> As soon as we started new buffer (at the beginning of the flush process) Sanja> it is opened for adding information into it while other thread will Sanja> flush other buffers. Good! Sanja> First leader Sanja> ------------ >> Sanja> First thread which come to flush (leader of this flush pass) detect Sanja> buffers which it should flush (beginning and ending / min and max) and Sanja> set it as goal of current pass. >> >> Is the 'goal' is a struct that is protected by it's own mutex ? Sanja> I though about loghandler lock for state of flush changing so goal will Sanja> not need its own protection because only one thread can change it, and Sanja> other threads will wait state change while this one will change the Sanja> goal. If we hold the mutex for the whole duration of the flush, there is no way for the waiting threads to register which buffers/LSN to flush to. Wouldn't it be good for all threads to be able to register their LSN, even while one threads is working on flush, so that when we are finished flushing we know who will be the next leader and also the max LSN that we need to flush? Sanja> the "Log Handler Buffer Design" section. Sanja> When it reach last buffer it will: Sanja> - wait of finishing other thread which. >> >> What do you mean with the above? Sanja> - wait of finishing other thread which are flushing buffers. I assume you mean 'wait for thread, that are flushing buffers, to finnish ?" >> Why do we need to calculate which buffers should be flushed? >> Isn't it enough to register the max LSN of all waiting threads and >> make the thread with max LSN the leader of next flush ? Sanja> The problem is that the address which flush process gets can not be real Sanja> LSN but "horizon" address. So we will need get real LSN of real Sanja> last buffer, the real buffer is much simplier to get (in most cases it Sanja> is the last buffer). If you have a long running transactions that is doing a lot of work or another concurrent transaction inserting blobs then the common case will be that it's not the last buffer you need to flush. Don't we flush up to the end-LSN (ie, last byte) of the commit record? AS each buffer must have an active disk address, it should be trivial to detect if the LSN we want to flush is part of the current buffer. If that's the case we can just flush the whole buffer (including any data after the end-LSN if such exists). Sanja> - then it will take part in flushing buffers (see 'Flush pass' for Sanja> details). >> >> Isn't this a common senario: >> >> - Thread calculates which buffers to flush >> - Wait for flush pass of leader to finnish >> - Wake up >> - Notice that everything to it's current lsn is already flushed and return >> >> If this is the case, we should in many cases be able to totally avoid >> the flush pass and taking any mutex for the different log buffers. Sanja> Above is checked at the begining. If thread see that everything flushed Sanja> it quit without waiting, it will wait and do nothing only if it flush Sanja> and all buffers are taken fro flush so it can't do something helpful and Sanja> will wait till the end of tflush process. Sanja> - after which it: Sanja> - return if all need buffers are flushed (not LSN which could be LSN Sanja> from future at the moment of flush call which means all current Sanja> buffers). In which senario do we have a 'not LSN' ? Don't we in all cases, except for checkpoint, flush until the commit record is on disk ? Sanja> - become leader of the next pass (if it is still registered as leader) Sanja> - take part in the next flush pass How do you get registered as leader if the 'goal struct' is locked during the flush process? >> When we flush the last buffer, we could increase the flush LSN to the >> current maximum LSN. This way we can release a lot of threads without >> having them do a flush pass Sanja> It is done. Good. Sanja> Flush process status Sanja> -------------------- >> Sanja> - FREE: no threads is flushing Sanja> - LEADER_DETECT: leader came but goal is not detected yet (other Sanja> threads just will wait) Sanja> - FLUSH_PASS: flushing buffer process is in progress Sanja> - FLUSH_FINISHING: all buffers are at flush we are waiting for the end Sanja> of the process Sanja> - FLUSH_END: flush finished leader is syncing and updating statistic Sanja> - NEW_LEADER: switching to the new leader registered for next pass Sanja> (if we need new pass) (maybe it will be merged with LEADER_DETECT). >> >> Assuming we have store the flush lsn as the goal, we may not need a >> LEADER_DETECT or NEW_LEADER phase. Sanja> We can't. we should detect which buffers will be flushed in any case Sanja> just to share work with other threads (if it is not the only buffer). What work are done at the same time ? Can there be more than one thread doing a flush of some buffers at the same time? Sanja> Statuses graph Sanja> -------------- >> >> Sanja> +------------- NEW_LEADER <---------------------------+ Sanja> | | Sanja> V | Sanja> FREE ---> LEADER_DETECT ---> FLUSH_PASS ---> FLUSH_FINISHING --> FLUSH_END Sanja> ^ | Sanja> | | Sanja> +-------------------------------------------------------------------+ >> Sanja> Flush pass Sanja> ---------- >> Sanja> - increase number of participant and start loop from min buffer >> >> Can we have more than one flushing thread active at once ? Sanja> yes. I decided that writing to file is mostly coping data in memory to Sanja> file buffer of OS, so meny threds in case if we have many CPUs can help. ok, so there can be many flush threads at the same time. Are they all defined as leaders ? Who will do the final sync ? Sanja> - if buffer flushed skip to the next one if no then flush it Sanja> - when max buffer of current pass goual processed: decrease participant counter if status is not Sanja> FLUSH_FINISHING change it. If counter is 0 switch to FLUSH_END. Is it the thread that does 'flush end' that does the sync ? What happens in this senario ? Thread 1 want to flush to LSN 1000 Thread 2 want to flush to LSN 2000 Thread 3 want to flush to LSN 3000 Assume thread 2 comes in first and gets to be leader and starts flushing buffers. Then thread 3 starts to flush. Then thread 1 starts to flush Thread 1 will notice that thread 2 has already flushed the buffer it needs. I assume 1 will now wait for the sync of the log. Correct? When thread 2 is finished, will it sync and return or wait for thread 3 to finish and do a sync? Regards, Monty