Hi Andrei,
On Thu, 2010-04-22 at 11:20 +0300, Andrei Elkin wrote:
> Luis, hello.
>
>
> The patch is great!
> Although my heart was crying to see how ruthlessly you had exterminated my awesome
> simulations :-).
:)
> But the gain simplicity heals it.
>
> I have few little findings.
>
> > Solution: We fix this by not saving and restoring the
> > last_master_timestamp when the SQL thread goes to sleep. Instead,
> > we set the rli->last_master_timestamp to zero, when the SQL
> > thread starts waiting, and keep it to zero,
>
> > when the SQL thread awakes.
>
> Would it be better to say
>
> until the SQL thread has been awakened (by a new event) ?
>
> or as you're saying later
Ok.
>
> > A complementary change in this patch, is the one that
> > sets rli->last_master_timestamp before and not after event
> > execution. Both of these changes make the spike go away, and keep
> > showing zero,
>
> > until the event starts being processed,
>
> in here ^.
>
> > as opposed
> > to showing zero until the event finishes its execution.
> > @ mysql-test/suite/manual/t/rpl_replication_delay.test
> > Removed this unmaintained test case.
> > @ mysql-test/suite/rpl/t/seconds_behind_master.test
> > Added some new tests. Collected remains from the removed
> > rpl_replication_delay, refactored and added in this file as well.
> > @ sql/log_event.cc
> > Reworked the fault injection to meet the new test case needs.
> > Refactored the stmt_done interface (does not need 'when' anymore.
> > @ sql/log_event_old.cc
> > Accomodating changes in the stmt_done interface.
> > @ sql/rpl_rli.cc
> > Updated stmt_done implementation.
> > @ sql/rpl_rli.h
> > Changed stmt_done interface, removing the not needed
> > event_creation_time.
> > @ sql/slave.cc
> > Two changes:
> > (i) removed the save and restore of rli->last_master_timestamp,
> > when the SQL thread goes to sleeps.
> > (ii) Updating the rli->last_master_timestamp at the beginning of
> > the event execution.
> >
> > removed:
> > mysql-test/suite/manual/
> > mysql-test/suite/manual/r/
> > mysql-test/suite/manual/r/rpl_replication_delay.result
> > mysql-test/suite/manual/t/
> > mysql-test/suite/manual/t/rpl_replication_delay-slave.opt
> > mysql-test/suite/manual/t/rpl_replication_delay.test
> > added:
> > mysql-test/suite/rpl/r/seconds_behind_master.result
> > mysql-test/suite/rpl/t/seconds_behind_master.test
> > modified:
> > sql/log_event.cc
> > sql/log_event_old.cc
> > sql/rpl_rli.cc
> > sql/rpl_rli.h
> > sql/slave.cc
>
> > === added file 'mysql-test/suite/rpl/t/seconds_behind_master.test'
> > --- a/mysql-test/suite/rpl/t/seconds_behind_master.test 1970-01-01 00:00:00
> +0000
> > +++ b/mysql-test/suite/rpl/t/seconds_behind_master.test 2010-04-15 16:39:31
> +0000
> > @@ -0,0 +1,152 @@
> > +-- source include/master-slave.inc
> > +-- source include/have_debug.inc
> > +
> > +# make the test only run once (STMT is actually needed because of the SLEEPs
> below)
> > +-- source include/have_binlog_format_statement.inc
> > +
> > +SET @old_debug= @@global.debug;
> > +call mtr.add_suppression('Unsafe statement binlogged in statement format since
> BINLOG_FORMAT = STATEMENT');
> > +
> > +#
> > +# Based on BUG#29309 test case
> > +#
> > +
> > +connection master;
> > +create table t1 (f1 int);
> > +sync_slave_with_master;
> > +source include/stop_slave.inc;
> > +connection master;
> > +insert into t1 values (1);
> > +
> > +sleep 3;
> > +
> > +#connection slave;
> > +connection slave;
> > +lock table t1 write;
> > +source include/start_slave.inc;
> > +
> > +# ok, now wait for the SQL thread to start the insert
> > +let $wait_timeout= 60; # Override default of 30 seconds with 60.
> > +let $show_statement= SHOW PROCESSLIST;
> > +let $field= Info;
> > +let $condition= = 'insert into t1 values (1)';
> > +source include/wait_show_condition.inc;
> > +
> > +let $sbm= query_get_value("SHOW SLAVE STATUS", Seconds_Behind_Master, 1);
> > +if (`SELECT $sbm < 3 or $sbm > 6`)
> > +{
> > + -- echo #### Seconds behind master must show value approx. to 3
> > + -- echo #### giving a 3 secs margin (machine where the test executes
> > + -- echo #### may be slow). Best effort approach.
> > + -- echo #### $sbm > 6 || $sbm < 3
> > + -- die
> > +}
>
> I and Sven discussed similar issue on #rep yesterday.
> Sure, assert($sbm >= 3) shall hold because of the explicit --sleep 3.
> However, there is no determined range - it all depends on exec env - and therefore
> the right margin assert($sbm <= 6).
> We can only wait for some reasonable time in
> > +source include/wait_show_condition.inc;
>
> So I suggest to relax the assert to if (`SELECT $sbm < 3) ...
OK, but then as you state:
"Sure, assert($sbm >= 3) shall hold because of the explicit --sleep 3."
I guess the assertion will then be almost empty. However,
it provides some coverage... So I agree. I'll keep the "< 3"
condition (the alternative would be to remove this part of the
test altogether).
> > +if (`SELECT $sbm < 3 or $sbm > 6`)
> > +
> > +unlock tables;
> > +connection master;
> > +sync_slave_with_master;
> > +lock table t1 write;
> > +connection master;
> > +insert into t1 values (2);
> > +
> > +sleep 3;
> > +
> > +connection slave;
> > +# ok, now wait for the SQL thread to start the insert
> > +let $wait_timeout= 60; # Override default of 30 seconds with 60.
> > +let $show_statement= SHOW PROCESSLIST;
> > +let $field= Info;
> > +let $condition= = 'insert into t1 values (2)';
> > +source include/wait_show_condition.inc;
> > +
> > +flush logs /* should not reset the sbm value */;
> > +
> > +sleep 3;
> > +
> > +let $sbm= query_get_value("SHOW SLAVE STATUS", Seconds_Behind_Master, 1);
> > +if (`SELECT $sbm < 6 or $sbm >= 9`)
> > +{
> > + -- echo #### Seconds behind master must show value approx. to 6
> > + -- echo #### giving a 3 secs margin (machine where the test executes
> > + -- echo #### may be slow). Best effort approach.
> > + -- echo #### $sbm < 6 || $sbm >= 9
> > + -- die
> > +}
>
> Same as above.
Yep, same as above.
>
> > +
> > +unlock tables;
> > +connection master;
> > +drop table t1;
> > +sync_slave_with_master;
> > +
> > +#
> > +# BUG#52166 test case
> > +#
> > +
> > +# 1. Check that seconds behind master is set to 0 when SQL
> > +# thread waits for more events.
> > +# 2. Check that seconds behind master is updated once after
> > +# a newly received event starts to be processed.
> > +
> > +-- source include/master-slave-reset.inc
> > +-- connection master
> > +
> > +# this will increase the timestamp +3600
> > +SET GLOBAL DEBUG= '+d,dec_event_time_by_1_hour';
> > +CREATE TABLE t1 (a INT);
> > +SET GLOBAL DEBUG= '-d,dec_event_time_by_1_hour';
> > +-- sync_slave_with_master
> > +
> > +# ok, now wait for the SQL thread to sleep
> > +let $wait_timeout= 60; # Override default of 30 seconds with 60.
> > +let $show_statement= SHOW PROCESSLIST;
> > +let $field= State;
> > +let $condition= = 'Slave has read all relay log; waiting for the slave I/O
> thread to update it';
> > +-- source include/wait_show_condition.inc
> > +
> > +-- let $sbm= query_get_value("SHOW SLAVE STATUS", Seconds_Behind_Master, 1)
> > +if (`SELECT $sbm <> 0`)
> > +{
> > + -- echo #### Seconds behind master must show '0', because SQL thread is now
> waiting for new events.
> > + -- echo #### SBM ($sbm seconds) != 0 !! SQL thread is waiting so should be
> zero.
> > + -- die
> > +}
> > +
> > +-- connection master
> > +
>
> > +# 5 seconds this will give us time at the slave to
>
> You mean to say 10, don't you?
I do. (I had 5 in the initial approach, then switched to 10
and forgot to update the comment).
>
> > +# show that the Seconds_Behind_Master will increase
> > +# when this query starts to execute
> > +-- disable_warnings
> > +SET GLOBAL DEBUG= '+d,dec_event_time_by_1_hour';
> > +INSERT INTO t1 VALUES (sleep (10));
> > +SET GLOBAL DEBUG= '-d,dec_event_time_by_1_hour';
> > +DROP TABLE t1;
> > +-- enable_warnings
> > +
> > +-- connection slave
> > +
> > +# ok, now wait for the SQL thread to start the insert
> > +let $wait_timeout= 60; # Override default of 30 seconds with 60.
> > +let $show_statement= SHOW PROCESSLIST;
> > +let $field= Info;
> > +let $condition= = 'INSERT INTO t1 VALUES (sleep (10))';
> > +-- source include/wait_show_condition.inc
> > +
>
> > +-- sleep 2
> > +
> > +-- let $sbm= query_get_value("SHOW SLAVE STATUS", Seconds_Behind_Master, 1)
> > +if (`SELECT $sbm <= 3600`)
> > +{
> > + -- echo #### Seconds behind master must show > 3600 because the event that
>
> > + -- echo #### is being processed was issued 3600 seconds ago.
> > + -- echo #### SBM ($sbm seconds) <= 3600 !
> > + -- die
> > +}
>
> I suggest to change this paragraph to
>
> source include/wait_for_slave_param
> with
> $slave_param, $slave_param_value Seconds_Behind_Master, 3600 and
> $slave_param_comparison = <=
>
> to avoid the evil sleep.
Ok, but you mean > instead of <=...
We need to wait until "Seconds_Behind_Master > 3600" or,
in other words, while "Seconds_Behind_Master <= 3600"...
>
>
> > +
> > +-- connection master
> > +-- sync_slave_with_master
> > +-- connection master
> > +
> > +SET @@global.debug= @old_debug;
> > +
> > +-- source include/master-slave-end.inc
> >
> > === modified file 'sql/log_event.cc'
> > --- a/sql/log_event.cc 2010-03-03 14:43:35 +0000
> > +++ b/sql/log_event.cc 2010-04-15 16:39:31 +0000
> > @@ -235,10 +235,6 @@ private:
> > flag_set m_flags;
> > };
> >
> > -#ifndef DBUG_OFF
> > -uint debug_not_change_ts_if_art_event= 1; // bug#29309 simulation
> > -#endif
> > -
> > /*
> > pretty_print_str()
> > */
> > @@ -781,32 +777,7 @@ int Log_event::do_update_pos(Relay_log_i
> > Matz: I don't think we will need this check with this refactoring.
> > */
> > if (rli)
> > - {
> > - /*
> > - bug#29309 simulation: resetting the flag to force
> > - wrong behaviour of artificial event to update
> > - rli->last_master_timestamp for only one time -
> > - the first FLUSH LOGS in the test.
> > - */
> > - DBUG_EXECUTE_IF("let_first_flush_log_change_timestamp",
> > - if (debug_not_change_ts_if_art_event == 1
> > - && is_artificial_event())
> > - {
> > - debug_not_change_ts_if_art_event= 0;
> > - });
> > -#ifndef DBUG_OFF
> > - rli->stmt_done(log_pos,
> > - is_artificial_event() &&
> > - debug_not_change_ts_if_art_event > 0 ? 0 : when);
> > -#else
> > - rli->stmt_done(log_pos, is_artificial_event()? 0 : when);
> > -#endif
> > - DBUG_EXECUTE_IF("let_first_flush_log_change_timestamp",
> > - if (debug_not_change_ts_if_art_event == 0)
> > - {
> > - debug_not_change_ts_if_art_event= 2;
> > - });
> > - }
> > + rli->stmt_done(log_pos);
> > return 0; // Cannot fail currently
> > }
> >
> > @@ -942,6 +913,16 @@ bool Log_event::write_header(IO_CACHE* f
> > }
> >
> > now= (ulong) get_time(); // Query start time
>
> > + if (DBUG_EVALUATE_IF("inc_event_time_by_1_hour",1,0) &&
>
> Do we really need inc_event_time_by_1_hour at all?
> The tests don't use it.
No, the tests don't use it. But tests for WL#4806 I think they do.
Can we keep this?
> > + DBUG_EVALUATE_IF("dec_event_time_by_1_hour",1,0))
> > + {
>
> > + DBUG_ASSERT(0);
>
> According to the def of DBUG_EVALUATE_IF() it's a check if DBUG is
> compiled in. Hence the assert does not look as appropriate even though
> it's currently expanded to noop in DBUG_OFF builts.
If DBUG is not compiled in, then the compiler should optimize the
if (0) { A; } else { B; }
to
B;
So in this case A; (the DBUG_ASSERT) is compiled away in DBUG
disabled builds.
Given that B relies on DBUG as well, it should also
be compiled away, and the entire if-else block goes away
in DBUG off builds.
In any case, just to be sure, we can wrap it with
#ifndef DBUG_OFF
#endif
as you had with in your error simulation, can't we?
> > + }
> > + else
> > + {
> > + DBUG_EXECUTE_IF("inc_event_time_by_1_hour", now= now + 3600;);
> > + DBUG_EXECUTE_IF("dec_event_time_by_1_hour", now= now - 3600;);
> > + }
> >
> > /*
> > Header will be of size LOG_EVENT_HEADER_LEN for all events, except for
> > @@ -7873,7 +7854,7 @@ Rows_log_event::do_update_pos(Relay_log_
[snip]
> > /*
>
> regards,
>
> Andrei
Thanks.
Cheers,
Luís