On Fri, 2010-09-03 at 23:39 +0300, Andrei Elkin wrote:
> Li-Bing, hello.
>
> I think we need some work to fix the bug.
> The good thing is that we certainly know now what to do.
>
> > #At file:///home/anders/work/bzrwork/mysql-5.1-bugteam/ based on
> revid:georgi.kodinov@stripped
> >
> > 3473 Li-Bing.Song@stripped 2010-08-06
> > Bug #55375 Transaction bigger than max_binlog_cache_size crashes slave
> >
> > When slave executes a transaction bigger than slave's
> max_binlog_cache_size,
> > slave will crash. It is caused by the assert that server should only roll
> back
> > the statement but not the whole transaction if the error
> ER_TRANS_CACHE_FULL
> > happens. But slave sql thread always rollbacks the whole transaction when
> > an error happens.
>
> First of all, the issues is the assert in binlog_rollback().
> Should slave not have --log-bin it would rollback the transaction
> gracefully and none would notice anything.
>
> Per my analysis it is the assert in
>
> if (mysql_bin_log.check_write_error(thd))
> assert(!all)
yes.
>
> that is not perfect.
>
> I failed to understand comments backing it up,
> but according to the code of
>
> bool MYSQL_BIN_LOG::check_write_error(THD *thd)
>
> case ER_TRANS_CACHE_FULL:
> case ER_ERROR_ON_WRITE:
> case ER_BINLOG_LOGGING_IMPOSSIBLE:
>
> checked= TRUE;
>
> the assert means to say the three errors can not cause the transaction
> rollback. Neither can do that ROLLBACK query itself
> (maybe that is what's behind "this must not happen as a rollback is
> written directly to the binary log" comments).
> And that's correct on the master, but not on the slave.
yes.
>
> The patch has to take on this little finding. But using the chance
> I am leaving few few words on some places.
> > Ather this patch, we always clear any error set in sql thread(it is
> different
> > from the error in 'SHOW SLAVE STATUS') and it is cleared before rolling
> back
> > the transaction.
>
> > @ sql/log_event.cc
> > Some functions don't return the error code, so it is a wrong error
> code.
> > The error should always be set into thd->main_da. So we use
> > slave_rows_error_report to report the right error.
>
> I suggest to follow a rule to comment on each changed file.
ok.
>
> >
> > removed:
> > mysql-test/suite/rpl/t/rpl_binlog_max_cache_size-master.opt
> > modified:
> > mysql-test/suite/rpl/r/rpl_binlog_max_cache_size.result
>
> > mysql-test/suite/rpl/t/rpl_binlog_max_cache_size.test
>
> Two little notices in the regression test part, read on there.
>
> > sql/log_event.cc
>
> I agree with some ideas of refactoring of errors handing in that file,
> but this piece certainly overlaps with other work on existing
> bugs in that area, e.g
> BUG#36524 incident of deadlock on slave is over-dramatized
> (needs merging)
> Bug #25597 Extraneous error message displayed for slave SQL thread
> errors
> (you're welcome to take over even though it's mine and in-Progress).
>
I didn't see too much similarity between these bugs.
> Unless this bug requires those changes (then more clarification is
> necessary) I suggest to put them aside.
I think so. It will report an error which's number is 1 if I don't fix
it. It is apparently wrong.
>
>
[snip]
> > +INSERT INTO t1 VALUES (10, repeat("a", 32));
> > +INSERT INTO t1 VALUES (9, repeat("a", 32));
> > +INSERT INTO t1 VALUES (8, repeat("a", 32));
> > +INSERT INTO t1 VALUES (7, repeat("a", 32));
> > +INSERT INTO t1 VALUES (6, repeat("a", 32));
> > +INSERT INTO t1 VALUES (5, repeat("a", 32));
> > +INSERT INTO t1 VALUES (4, repeat("a", 32));
> > +INSERT INTO t1 VALUES (3, repeat("a", 32));
> > +INSERT INTO t1 VALUES (2, repeat("a", 32));
> > +INSERT INTO t1 VALUES (1, repeat("a", 32));
> > +COMMIT;
>
> I suggest to --disable_result_log for the above transaction. it just
> consumes space.
>
OK.
>
> >
>
> > +--echo
> ########################################################################################
> > +--echo # 8 - Bug#55375 Transaction bigger than max_binlog_cache_size
> crashes slave
> > +--echo
> ########################################################################################
>
> Could you please make it more bold to say `regression test' somewhere in
> ^ ?
OK.
>
> > +
> > +SET GLOBAL max_binlog_cache_size = 4096;
> > +SET GLOBAL binlog_cache_size = 4096;
>
> Actually that's max_binlog_cache_size = 4096 that matters.
> SET @@global.binlog_cache_size can be set to arbitrary value.
Please see bug#55377.
we have to set binlog_cache_size until the bug is fixed.
>
> >
>
> cheers,
>
> Andrei
--
Your Sincerely,
Libing Song
==================================
MySQL Replication Team
Software Engineer
Email : Li-Bing.Song@stripped
Skype : libing.song
MSN : slb_database@stripped
Phone : +86 010-6505-4020 ext. 319
Mobile: +86 138-1144-2038
==================================