List:Commits« Previous MessageNext Message »
From:Libing Song Date:September 16 2010 8:43am
Subject:Re: bzr commit into mysql-5.1-bugteam branch (Li-Bing.Song:3473)
Bug#55375
View as plain text  
On Wed, 2010-09-08 at 13:47 +0300, Andrei Elkin wrote:
> Libing Song <Li-Bing.Song@stripped> writes:
> 
> > On Fri, 2010-09-03 at 23:39 +0300, Andrei Elkin wrote:
> >> Li-Bing, hello.
> >> 
> >> I think we need some work to fix the bug.
> >> The good thing is that we certainly know now what to do.
> >> 
> >> > #At file:///home/anders/work/bzrwork/mysql-5.1-bugteam/ based on
> revid:georgi.kodinov@stripped
> >> >
> >> >  3473 Li-Bing.Song@stripped	2010-08-06
> >> >       Bug #55375  	Transaction bigger than max_binlog_cache_size
> crashes slave
> >> >       
> >> >       When slave executes a transaction bigger than slave's
> max_binlog_cache_size,
> >> >       slave will crash. It is caused by the assert that server should
> only roll back
> >> >       the statement but not the whole transaction if the error
> ER_TRANS_CACHE_FULL 
> >> >       happens. But slave sql thread always rollbacks the whole
> transaction when
> >> >       an error happens.
> >> 
> >> First of all, the issues is the assert in binlog_rollback().
> >> Should slave not have --log-bin it would rollback the transaction
> >> gracefully and none would notice anything.
> >> 
> >> Per my analysis it is the assert in
> >> 
> >>    if (mysql_bin_log.check_write_error(thd))
> >>       assert(!all)
> > yes.
> 
> I am puzzled, in the latest 
> 
> http://lists.mysql.com/commits/117650
> 
> there are still
> 
> changes to 
> 
>    === modified file 'sql/slave.cc'
>    --- a/sql/slave.cc	2010-07-20 18:07:36 +0000
>    +++ b/sql/slave.cc	2010-09-07 03:27:24 +0000
>    @@ -2343,7 +2343,7 @@ static int exec_relay_log_event(THD* thd
>    else
>    {
>    exec_res= 0;
>    -            end_trans(thd, ROLLBACK);
>    +            const_cast<Relay_log_info*>(rli)->cleanup_context(thd, 1);
Above code is not related to the bug.
end_trans is called by cleanup_context().
Don't you think cleanup_context() should be call when a transaction
fails and needs to retry?
> 
> relating to ending transaction on the slave whereas it *only* the assert
> to refine that is necessary:
> 
> if (mysql_bin_log.check_write_error(thd))
>    assert(!all || thd->slave_thread)
> 
Yes, it is easy one. But I agree with the comments that 'all' is true
means a real 'ROLLBACK' statement and the 'ROLLBACK' statement can not
cause a write problem.
On slave, we call end_trans() after a transaction has failed. It means
we fake a 'ROLLBACK' statement. So I think we should clear the thd's
error before we fire a 'ROLLBACK' automatically.

> 
> >> 
> >> that is not perfect.
> >> 
> >> I failed to understand comments backing it up,
> >> but according to the code of
> >> 
> >>   bool MYSQL_BIN_LOG::check_write_error(THD *thd)
> >> 
> >>     case ER_TRANS_CACHE_FULL:
> >>     case ER_ERROR_ON_WRITE:
> >>     case ER_BINLOG_LOGGING_IMPOSSIBLE:
> >> 
> >>         checked= TRUE;
> >> 
> >> the assert means to say the three errors can not cause the transaction
> >> rollback. Neither can do that ROLLBACK query itself
> >> (maybe that is what's behind "this must not happen as a rollback is
> >> written directly to the binary log" comments).
> >> And that's correct on the master, but not on the slave.
> > yes.
> >> 
> >> The patch has to take on this little finding. But using the chance
> >> I am leaving few few words on some places.
> ...
> >> 
> >> >       sql/log_event.cc
> >> 
> >> I agree with some ideas of refactoring of errors handing in that file,
> >> but this piece certainly overlaps with other work on existing
> >> bugs in that area, e.g
> >> BUG#36524 incident of deadlock on slave is over-dramatized
> >> (needs merging)
> >> Bug #25597 Extraneous error message displayed for slave SQL thread
> >> errors 
> >> (you're welcome to take over even though it's mine and in-Progress).
> >> 
> > I didn't see too much similarity between these bugs.
> 
> That's reasonable. They were reported ago. Their common part is in the
> error reporting in slave system.  BUG#36524 tries to enhance the error reporting to
> engage thd->main_da more actively.
> 
> 
> >> Unless this bug requires those changes (then more clarification is
> >> necessary) I suggest to put them aside.
> 
> > I think so. It will report an error which's number is 1 if I don't fix
> > it. It is apparently wrong. 
> 
> That'd be a better way. Thanks.
> 
> cheers,
> 
> Andrei

-- 
Your Sincerely,
Libing Song
==================================
MySQL Replication Team
Software Engineer


Email : Li-Bing.Song@stripped
Skype : libing.song
MSN   : slb_database@stripped
Phone : +86 010-6505-4020 ext. 319
Mobile: +86 138-1144-2038
==================================

Thread
bzr commit into mysql-5.1-bugteam branch (Li-Bing.Song:3473) Bug#55375Li-Bing.Song6 Aug
  • Re: bzr commit into mysql-5.1-bugteam branch (Li-Bing.Song:3473)Bug#55375Alfranio Correia23 Aug
    • Re: bzr commit into mysql-5.1-bugteam branch (Li-Bing.Song:3473)Bug#55375Libing Song23 Aug
      • Re: bzr commit into mysql-5.1-bugteam branch (Li-Bing.Song:3473)Bug#55375Alfranio Correia23 Aug
Re: bzr commit into mysql-5.1-bugteam branch (Li-Bing.Song:3473)Bug#55375Libing Song6 Sep
Re: bzr commit into mysql-5.1-bugteam branch (Li-Bing.Song:3473)Bug#55375Libing Song7 Sep
Re: bzr commit into mysql-5.1-bugteam branch (Li-Bing.Song:3473)Bug#55375Libing Song7 Sep
Re: bzr commit into mysql-5.1-bugteam branch (Li-Bing.Song:3473)Bug#55375Libing Song16 Sep
Re: bzr commit into mysql-5.1-bugteam branch (Li-Bing.Song:3473)Bug#55375Libing Song17 Sep