On Thu, 2010-09-16 at 14:58 +0300, Andrei Elkin wrote:
> LiBing, hello.
>
> >> >> Per my analysis it is the assert in
> >> >>
> >> >> if (mysql_bin_log.check_write_error(thd))
> >> >> assert(!all)
> >> > yes.
> >>
> >> I am puzzled, in the latest
> >>
> >> http://lists.mysql.com/commits/117650
> >>
> >> there are still
> >>
> >> changes to
> >>
> >> === modified file 'sql/slave.cc'
> >> --- a/sql/slave.cc 2010-07-20 18:07:36 +0000
> >> +++ b/sql/slave.cc 2010-09-07 03:27:24 +0000
> >> @@ -2343,7 +2343,7 @@ static int exec_relay_log_event(THD* thd
> >> else
> >> {
> >> exec_res= 0;
> >> - end_trans(thd, ROLLBACK);
> >> +
> const_cast<Relay_log_info*>(rli)->cleanup_context(thd, 1);
>
> > Above code is not related to the bug.
>
> Exactly.
>
> > end_trans is called by cleanup_context().
>
> Obviously yes.
>
> > Don't you think cleanup_context() should be call when a transaction
> > fails and needs to retry?
>
> Although I agree that cleanup_context() would look okay in this place
> I don't like the idea of changing sources in this bug framework.
> In order to make this little refactoring we have to make sure nothing
> would got broken, and that's an involving task to prove.
> In particular, the prove should account possiblity of calling
> cleanup_context() multiple times.
I am afraid that I have to do that. Because I removed the
cleanup_context() in Rows_log_event::do_apply_event which clears context
if the event fails.
@@ -7649,7 +7654,6 @@
get_type_str(),
RPL_LOG_NAME, (ulong) log_pos);
thd->reset_current_stmt_binlog_row_based();
- const_cast<Relay_log_info*>(rli)->cleanup_context(thd, error);
thd->is_slave_error= 1;
DBUG_RETURN(error);
}
If I didn't remove it, I would have to clear thd's error just before it.
But that is bad as thd->error will be used late in handle_slave_sql.
So I removed cleanup_context here and clean the context later.
There are two places that need to call cleanup_context.
One is at the ending of handle_slave_sql, it has done before.
Another is in exec_relay_log_event() where I added it but you said we
should not add it.
>
> To my brief analysis the current sources don't have any real issue.
> First, if a temp error appeared in Rows-log applying cleanup_context()
> is called there (so your patch seems to call it for 2nd time).
> Second, if the error was from Query-log there is nothing to do inside
> of cleanup_context() except
> >> - end_trans(thd, ROLLBACK);
>
>
>
> I suggest to describe this little finding more formally in a WL or a
> separate bug report.
>
> >>
> >> relating to ending transaction on the slave whereas it *only* the assert
> >> to refine that is necessary:
> >>
> >> if (mysql_bin_log.check_write_error(thd))
> >> assert(!all || thd->slave_thread)
> >>
> > Yes, it is easy one.
> > But I agree with the comments that 'all' is true
> > means a real 'ROLLBACK' statement and the 'ROLLBACK' statement can not
> > cause a write problem.
> > On slave, we call end_trans() after a transaction has failed. It means
> > we fake a 'ROLLBACK' statement. So I think we should clear the thd's
> > error before we fire a 'ROLLBACK' automatically.
>
> A good point, I agree actually.
>
>
> cheers,
>
> Andrei
--
Your Sincerely,
Libing Song
==================================
MySQL Replication Team
Software Engineer
Email : Li-Bing.Song@stripped
Skype : libing.song
MSN : slb_database@stripped
Phone : +86 010-6505-4020 ext. 319
Mobile: +86 138-1144-2038
==================================