On Wed, 2010-09-08 at 13:47 +0300, Andrei Elkin wrote:
> Libing Song <Li-Bing.Song@stripped> writes:
> > On Fri, 2010-09-03 at 23:39 +0300, Andrei Elkin wrote:
> >> Li-Bing, hello.
> >> I think we need some work to fix the bug.
> >> The good thing is that we certainly know now what to do.
> >> > #At file:///home/anders/work/bzrwork/mysql-5.1-bugteam/ based on
> >> >
> >> > 3473 Li-Bing.Song@stripped 2010-08-06
> >> > Bug #55375 Transaction bigger than max_binlog_cache_size
> crashes slave
> >> >
> >> > When slave executes a transaction bigger than slave's
> >> > slave will crash. It is caused by the assert that server should
> only roll back
> >> > the statement but not the whole transaction if the error
> >> > happens. But slave sql thread always rollbacks the whole
> transaction when
> >> > an error happens.
> >> First of all, the issues is the assert in binlog_rollback().
> >> Should slave not have --log-bin it would rollback the transaction
> >> gracefully and none would notice anything.
> >> Per my analysis it is the assert in
> >> if (mysql_bin_log.check_write_error(thd))
> >> assert(!all)
> > yes.
> I am puzzled, in the latest
> there are still
> changes to
> === modified file 'sql/slave.cc'
> --- a/sql/slave.cc 2010-07-20 18:07:36 +0000
> +++ b/sql/slave.cc 2010-09-07 03:27:24 +0000
> @@ -2343,7 +2343,7 @@ static int exec_relay_log_event(THD* thd
> exec_res= 0;
> - end_trans(thd, ROLLBACK);
> + const_cast<Relay_log_info*>(rli)->cleanup_context(thd, 1);
Above code is not related to the bug.
end_trans is called by cleanup_context().
Don't you think cleanup_context() should be call when a transaction
fails and needs to retry?
> relating to ending transaction on the slave whereas it *only* the assert
> to refine that is necessary:
> if (mysql_bin_log.check_write_error(thd))
> assert(!all || thd->slave_thread)
Yes, it is easy one. But I agree with the comments that 'all' is true
means a real 'ROLLBACK' statement and the 'ROLLBACK' statement can not
cause a write problem.
On slave, we call end_trans() after a transaction has failed. It means
we fake a 'ROLLBACK' statement. So I think we should clear the thd's
error before we fire a 'ROLLBACK' automatically.
> >> that is not perfect.
> >> I failed to understand comments backing it up,
> >> but according to the code of
> >> bool MYSQL_BIN_LOG::check_write_error(THD *thd)
> >> case ER_TRANS_CACHE_FULL:
> >> case ER_ERROR_ON_WRITE:
> >> case ER_BINLOG_LOGGING_IMPOSSIBLE:
> >> checked= TRUE;
> >> the assert means to say the three errors can not cause the transaction
> >> rollback. Neither can do that ROLLBACK query itself
> >> (maybe that is what's behind "this must not happen as a rollback is
> >> written directly to the binary log" comments).
> >> And that's correct on the master, but not on the slave.
> > yes.
> >> The patch has to take on this little finding. But using the chance
> >> I am leaving few few words on some places.
> >> > sql/log_event.cc
> >> I agree with some ideas of refactoring of errors handing in that file,
> >> but this piece certainly overlaps with other work on existing
> >> bugs in that area, e.g
> >> BUG#36524 incident of deadlock on slave is over-dramatized
> >> (needs merging)
> >> Bug #25597 Extraneous error message displayed for slave SQL thread
> >> errors
> >> (you're welcome to take over even though it's mine and in-Progress).
> > I didn't see too much similarity between these bugs.
> That's reasonable. They were reported ago. Their common part is in the
> error reporting in slave system. BUG#36524 tries to enhance the error reporting to
> engage thd->main_da more actively.
> >> Unless this bug requires those changes (then more clarification is
> >> necessary) I suggest to put them aside.
> > I think so. It will report an error which's number is 1 if I don't fix
> > it. It is apparently wrong.
> That'd be a better way. Thanks.
MySQL Replication Team
Email : Li-Bing.Song@stripped
Skype : libing.song
MSN : slb_database@stripped
Phone : +86 010-6505-4020 ext. 319
Mobile: +86 138-1144-2038