Hi Sven,
On Wed, 2010-04-28 at 13:05 +0200, Sven Sandberg wrote:
> Hi Luís,
>
> Great work with this! Source patch is almost perfect (except it should
> set the timestamp to the event *end* time as discussed earlier). This is
> a nice simplification and it's great to get rid of some code.
Thanks. Please read note, regarding setting the timestamp to the event's
end, inline.
> Some requested changes in the test case:
>
> (R1) Please add comments to the test case that explain what properties
> are being tested and how they are being tested. Please see comments inline.
Done. (I don't actually think that the test case needs much
improvement on the comments but I have extended them).
> (R2) I think the races can be removed completely from the test: please
> see comments inline.
Done. Thanks.
> (R3) Minor simplifications, see comments inline.
Done. (Although, I believe that they are not really relevant).
> (R4) Please source include/show_rpl_debug_info.inc before die
>
> /Sven
>
>
> Luis Soares wrote:
> > #At file:///home/lsoares/Workspace/bzr/work/bugfixing/52166/mysql-5.1-rep%2B2/
> based on revid:zhenxing.he@stripped
> >
> > 3183 Luis Soares 2010-04-15
> > BUG#52166: Seconds_Behind_Master spikes after long idle period
> >
[snip]
> >
> > === added file 'mysql-test/suite/rpl/t/seconds_behind_master.test'
> > --- a/mysql-test/suite/rpl/t/seconds_behind_master.test 1970-01-01 00:00:00
> +0000
> > +++ b/mysql-test/suite/rpl/t/seconds_behind_master.test 2010-04-15 16:39:31
> +0000
> > @@ -0,0 +1,152 @@
>
> (R1)
>
> Please add comments summarizing what properties are being tested
> (purpose), and how they are being tested (implementation).
Done. Added one sentence per assertion and I don't think it needs
more.
> My impression is that the parts that you borrowed from
> rpl_replication_delay are very similar to the parts that you introduced.
> If there is a difference, it would be good to list the tested properties
> together so that it is clear that there are no redundant tests.
Hmm... some may be, yes! But still, they are stepping stones to the
last BUG#29309 assertion.
> > +-- source include/master-slave.inc
> > +-- source include/have_debug.inc
> > +
> > +# make the test only run once (STMT is actually needed because of the SLEEPs
> below)
> > +-- source include/have_binlog_format_statement.inc
> > +
> > +SET @old_debug= @@global.debug;
> > +call mtr.add_suppression('Unsafe statement binlogged in statement format since
> BINLOG_FORMAT = STATEMENT');
> > +
> > +#
> > +# Based on BUG#29309 test case
>
> (R1)
>
> After you removed rpl_replication_delay, this actually *is* the
> BUG#29309 test case, you just rewrote it, right? If we say that test X
> is based on test Y, then I think test Y must exist (and be referred to
> in an explicit manner, like a filename); otherwise it will be very hard
> to figure out what is going on for someone who doesn't know the context.
>
> Also, please explain what property is being tested here.
Done.
> > +#
> > +
> > +connection master;
> > +create table t1 (f1 int);
> > +sync_slave_with_master;
> > +source include/stop_slave.inc;
> > +connection master;
> > +insert into t1 values (1);
> > +
> > +sleep 3;
> > +
> > +#connection slave;
>
> (R3) This line can be removed.
Done.
> > +connection slave;
> > +lock table t1 write;
> > +source include/start_slave.inc;
> > +
> > +# ok, now wait for the SQL thread to start the insert
> > +let $wait_timeout= 60; # Override default of 30 seconds with 60.
> > +let $show_statement= SHOW PROCESSLIST;
> > +let $field= Info;
> > +let $condition= = 'insert into t1 values (1)';
> > +source include/wait_show_condition.inc;
> > +
> > +let $sbm= query_get_value("SHOW SLAVE STATUS", Seconds_Behind_Master, 1);
> > +if (`SELECT $sbm < 3 or $sbm > 6`)
> > +{
> > + -- echo #### Seconds behind master must show value approx. to 3
> > + -- echo #### giving a 3 secs margin (machine where the test executes
> > + -- echo #### may be slow). Best effort approach.
> > + -- echo #### $sbm > 6 || $sbm < 3
>
> (R1) This paragraph doesn't parse well for me. Please try to restructure
> the text so that it is clear what property was being tested and failed.
Done. It's overly verbose. Changed to one sentence. The user
can grasp the idea reading the source. When the test fails he'll
have a hint, then he must check the test case, but I think the
source is really easy to follow.
> (R4) Also, please source include/show_rpl_debug_info.inc before die.
Done.
> (R2) Also, I think the test could be written without races: if you read
> UNIX_TIMESTAMP() before executing 'insert' on master, and read
> UNIX_TIMESTAMP() again after you read Seconds_Behind_Master, then the
> upper bound on $sbm is the difference between the two.
Done.
> > + -- die
> > +}
> > +
> > +unlock tables;
> > +connection master;
> > +sync_slave_with_master;
> > +lock table t1 write;
> > +connection master;
> > +insert into t1 values (2);
> > +
> > +sleep 3;
> > +
> > +connection slave;
> > +# ok, now wait for the SQL thread to start the insert
> > +let $wait_timeout= 60; # Override default of 30 seconds with 60.
> > +let $show_statement= SHOW PROCESSLIST;
> > +let $field= Info;
> > +let $condition= = 'insert into t1 values (2)';
> > +source include/wait_show_condition.inc;
> > +
> > +flush logs /* should not reset the sbm value */;
> > +
> > +sleep 3;
> > +
> > +let $sbm= query_get_value("SHOW SLAVE STATUS", Seconds_Behind_Master, 1);
> > +if (`SELECT $sbm < 6 or $sbm >= 9`)
> > +{
> > + -- echo #### Seconds behind master must show value approx. to 6
> > + -- echo #### giving a 3 secs margin (machine where the test executes
> > + -- echo #### may be slow). Best effort approach.
> > + -- echo #### $sbm < 6 || $sbm >= 9
>
> (R1), (R4), (R2) Similar to above: please clarify what tested property
> failed to hold; source include/show_rpl_debug_info.inc before die; and
> consider using UNIX_TIMESTAMP() to remove races from the test.
Done.
> > + -- die
> > +}
> > +
> > +unlock tables;
> > +connection master;
> > +drop table t1;
> > +sync_slave_with_master;
> > +
> > +#
> > +# BUG#52166 test case
> > +#
> > +
> > +# 1. Check that seconds behind master is set to 0 when SQL
> > +# thread waits for more events.
> > +# 2. Check that seconds behind master is updated once after
> > +# a newly received event starts to be processed.
> > +
> > +-- source include/master-slave-reset.inc
> > +-- connection master
> > +
> > +# this will increase the timestamp +3600
> > +SET GLOBAL DEBUG= '+d,dec_event_time_by_1_hour';
> > +CREATE TABLE t1 (a INT);
> > +SET GLOBAL DEBUG= '-d,dec_event_time_by_1_hour';
>
> (R3) Why are we altering the event's timestamp? This doesn't seem to be
> used. Suggest to remove this.
Done.
NOTE: We are not altering, we are just restoring the
correct behavior. But anyway, I have changed this.
> > +-- sync_slave_with_master
> > +
> > +# ok, now wait for the SQL thread to sleep
> > +let $wait_timeout= 60; # Override default of 30 seconds with 60.
> > +let $show_statement= SHOW PROCESSLIST;
> > +let $field= State;
> > +let $condition= = 'Slave has read all relay log; waiting for the slave I/O
> thread to update it';
> > +-- source include/wait_show_condition.inc
> > +
> > +-- let $sbm= query_get_value("SHOW SLAVE STATUS", Seconds_Behind_Master, 1)
> > +if (`SELECT $sbm <> 0`)
> > +{
> > + -- echo #### Seconds behind master must show '0', because SQL thread is now
> waiting for new events.
> > + -- echo #### SBM ($sbm seconds) != 0 !! SQL thread is waiting so should be
> zero.
> > + -- die
> > +}
> > +
> > +-- connection master
> > +
> > +# 5 seconds this will give us time at the slave to
> > +# show that the Seconds_Behind_Master will increase
> > +# when this query starts to execute
>
> (R1) This sentence assumes that the reader already understands what is
> being tested. Please explain that! Also, the number 5 doesn't match the
> sleep of 10 seconds below.
Done.
> > +-- disable_warnings
> > +SET GLOBAL DEBUG= '+d,dec_event_time_by_1_hour';
> > +INSERT INTO t1 VALUES (sleep (10));
> > +SET GLOBAL DEBUG= '-d,dec_event_time_by_1_hour';
>
> (R2) This test contains a race due to the sleep. I think we can use LOCK
> TABLES, as above, to get rid of the race. I.e.:
>
> slave> LOCK TABLES t1
> master> INSERT INTO t1 VALUES(1); # with modified timestamp
> slave> wait_show_condition # until INSERT shows up in processlist
> slave> query_get_value(SHOW SLAVE STATUS, Seconds_Behind_Master, 1)
> etc.
Done.
> > +DROP TABLE t1;
>
> (R3) For clarity, I'd suggest to move this cleanup code until the end.
Done.
NOTE: Not really needed, but OK, I've done it.
> > +-- enable_warnings
> > +
> > +-- connection slave
> > +
> > +# ok, now wait for the SQL thread to start the insert
> > +let $wait_timeout= 60; # Override default of 30 seconds with 60.
> > +let $show_statement= SHOW PROCESSLIST;
> > +let $field= Info;
> > +let $condition= = 'INSERT INTO t1 VALUES (sleep (10))';
> > +-- source include/wait_show_condition.inc
> > +
> > +-- sleep 2
> > +
> > +-- let $sbm= query_get_value("SHOW SLAVE STATUS", Seconds_Behind_Master, 1)
> > +if (`SELECT $sbm <= 3600`)
> > +{
> > + -- echo #### Seconds behind master must show > 3600 because the event that
>
> > + -- echo #### is being processed was issued 3600 seconds ago.
> > + -- echo #### SBM ($sbm seconds) <= 3600 !
> > + -- die
> > +}
> > +
> > +-- connection master
> > +-- sync_slave_with_master
> > +-- connection master
> > +
> > +SET @@global.debug= @old_debug;
> > +
> > +-- source include/master-slave-end.inc
> >
> >
[snip]
> > === modified file 'sql/slave.cc'
> > --- a/sql/slave.cc 2010-02-12 23:30:44 +0000
> > +++ b/sql/slave.cc 2010-04-15 16:39:31 +0000
> > @@ -2373,6 +2373,18 @@ static int exec_relay_log_event(THD* thd
> > {
> > int exec_res;
> >
> > + /*
> > + Even if we don't execute this event, we keep the master timestamp,
> > + so that seconds behind master shows correct delta (there are events
> > + that are not replayed, so we keep falling behind).
> > +
> > + If it is an artificial event, or a relay log event (IO thread generated
> > + event) or ev->when is set to 0, we don't update the
> > + last_master_timestamp.
> > + */
> > + if (!(ev->is_artificial_event() || ev->is_relay_log_event() ||
> (ev->when == 0)))
> > + rli->last_master_timestamp= ev->when;
> > +
FYI: Some events (almost all of them), don't contain valid
data on exec_time. These will show zero exec_time.
> > /*
> > This tests if the position of the beginning of the current event
> > hits the UNTIL barrier.
> >
[snip]
>
> --
> Sven Sandberg, Software Engineer
> MySQL AB, www.mysql.com
>
Incremental patch is available at:
http://lists.mysql.com/commits/107068
Regards,
Luís