Mats Kindahl wrote:
>
> He Zhenxing wrote:
> > Mats Kindahl wrote:
> >> He Zhenxing wrote:
> >>> Hi Mats
> >>>
> >>> I think the patch is good, approved!
> >>>
> >>> But I have some suggestions not strictly related to this bug, they are
> >>> related to the code handling binlog and transactions in general, I
> think
> >>> this part of the code is not working as expected, BUG#40221, and this
> >>> bug are all related to this. And there might be other hidden bugs too.
> I
> >>> think we should make this part of code more consistent and simpler. And
> >>> I think your patch for this bug is one step of this goal.
> >>>
> >>> According to the comment for THD::binlog_start_trans_and_stmt, this
> >>> function should be called for all statments in a
> >>> transaction(AUTOCOMMIT=0 or BEGIN is seen), but actually this is not
> the
> >>> case. Mostly, this function is only called for the first statement of
> >>> the transaction. BUG#40221 reveals that this is not called for other
> >>> statements in RBR, and the truth is that this is not called for other
> >>> statements in SBR either, which also means that
> >>> THD::binlog_commit/THD::binlog_rollback is not called for each
> statement
> >>> in the transaction as expected. The reason is because in SBR, we only
> >>> call THD::binlog_start_trans_and_stmt when the transaction log is
> empty,
> >>> so the thd->transaction.stmt.ha_list will not be set. Although I
> don't
> >>> see any problems caused by this, but I think it is a problem if the
> code
> >>> is not working as we supposed.
> >>>
> >>> So here are some ideas:
> >>> 1. always log all events (transactional or non-transactional updates)
> to
> >>> the cache first to simplify the logic.
> >> I think this is already the case. If it isn't, we should fix that.
> >>
> >
> > Currently, only all events in a transaction will be logged into cache
> > first, but statements not in a transaction that modifies
> > non-transactional tables will go directly into log file, I think we can
> > write this into the transaction cache at first too, and then flush to
> > log file. Of course, this can impact the performance.
>
> I think the impact on performance is overrated, and here is why:
>
> Writing events to a cache takes involves executing a piece of code as well as
> writing data to the cache. Copying data between caches takes a smaller amount of
> time, but it is an added penalty.
>
> If one writes directly to the log file, the size of the event is not known
> beforehand (it is not generated yet), so that means that the binary log has to
> be locked for the duration of the generation and writing to the binary log.
>
> If one applies the trick we have been discussing in WL#4007, the there is a very
> small lock time to just update a counter.
>
> If events are first written to the transactional cache, and then flushed to the
> binary log, it involves an extra copy of the bytes to the binary log, which
> increases latency, but OTOH, the binary log just has to be written for the
> duration of updating counters (there will be two), which means that the
> throughput increases significantly.
>
Very good, let's do it, maybe not right now, but I think we should once
we get the time.
[ snip ]