List:Commits« Previous MessageNext Message »
From:Luís Soares Date:October 26 2010 6:54am
Subject:Re: bzr commit into mysql-next-mr branch (Serge.Kozlov:3176) WL#5064
View as plain text  
Hi Serge,

On 10/24/2010 09:48 PM, Serge.Kozlov wrote:
> Hi, Luis
>
> Thanks for review!
>
> Find my replies below inline. New patch is in commit email
> http://lists.mysql.com/commits/121768
>
> 22.10.2010 15:50, Luís Soares пишет:
>> Hi Serge,
>>
>> Nice Work. Please find my review comments below.
>>
>> STATUS
>> ------
>>
>> Not approved.
>>
>> REQUIRED CHANGES
>> ----------------
>>
>> RC1. I think it is better to leave the checksum bytes out of
>> the fault injection part. Maybe make:
>>
>> + int debug_cor_pos = rand() % event_len;
>>
>> into
>>
>> + int debug_cor_pos = rand() % (event_len - BINLOG_CHECKSUM_LEN);
>
> Agree. Fixed.

OK

>> RC2. I think that in the fault injection code, we should avoid
>> corrupting FD events. They are always read from the binary
>> log when opening it, so it's likely that one
>> will be corrupting the FD event instead of the one one wants.
>>
>> See for instance, see, binlog.cc:show_binlog_events
>>
>> Something like:
>>
>> if (buf[EVENT_TYPE_OFFSET] != FORMAT_DESCRIPTION_EVENT)
>> {
>> /* do corruption injection */
>> }
>>
>> This would probably leave FD event corruption uncovered... maybe
>> you can think of something else...
>
> Agree. Fixed

OK.

>> RC3. I think you can drop the wait_for_slave_io_to_stop.inc and
>> the instructions that follow it:
>>
>> +START SLAVE;
>> +--source include/wait_for_slave_io_to_stop.inc
>> +let $io_error= query_get_value(SHOW SLAVE STATUS, Last_IO_Errno, 1);
>> +echo Last_IO_Errno: $io_error;
>>
>> and replace them with:
>>
>> START SLAVE IO_THREAD;
>> let $slave_io_errno= XXXX;
>> source include/wait_for_slave_io_error.inc
>>
>> The same applies to the SQL thread assertion.
>
> Agree. It is better solution than mine. Fixed.

OK.

>> RC4. At the end of the test (when all fault injections are
>> disabled), would be nice to start the slave and compare
>> both tables from the master and the slave:
>>
>> -- connection slave
>> SET GLOBAL debug= "";
>> -- source include/stop_slave.inc
>>
>> -- connection master
>> SET GLOBAL debug= "";
>> -- connection slave
>> -- source include/start_slave.inc
>>
>> -- let $diff_table_1= master:test.t1
>> -- let $diff_table_2= slave:test.t1
>> -- source include/diff_tables.inc
>>
>> # them proceed to clean up.
>
> Agree. Added to test case

OK.

>> RC5. I think we have a coverage issue. read_log_event in DUMP
>> thread is not covered. Can we get it covered? I wonder if
>> it is as simple as:
>>
>> 1. at the slave, stop slave
>> 2. at the master, SET GLOBAL debug="+d,corrupt_read_log_event";
>> 3. at the slave, start slave
>> 4. search
>
> Hm. The first block of code with SHOW BINLOG EVENTS does testing of corruption in
> binlog. What do you mean? E.g. We does
> a corruption in binlog and try to catch it on the slave?

Hmm... I wonder why didn't I finish the sequence... Sorry I hit the
send button too soon.

I was basically wondering what if read_log_event would fail (because
checksum check fails) inside rpl_master.cc:mysql_binlog_send. I trust
the error handling to be good, but we don't cover this case on the
test case.

So the sequence would be:

   1. at the slave, stop slave
   2. at the master, SET GLOBAL debug= "+d,corrupt_read_log_event";
   3. at the slave, start slave
   4. check that proper error is reported

Checking with SHOW BINLOG EVENTS is good, but not sufficient, IMHO.

>> REQUESTS
>> --------
>>
>> R1. This is a request for clarification.
>>
>> Why this change?
>>
>> ev= Log_event::read_log_event(&log, (mysql_mutex_t*)0, description_event,
>> - opt_master_verify_checksum);
>> + 0);
>>


Why this change ?

Regards,
Luís Soares

>> SUGGESTIONS
>> -----------
>> n/a
>>
>> DETAILS
>> -------
>>
>> n/a
>> On 10/20/2010 12:41 PM, Serge Kozlov wrote:
>>> #At file:///home/ksm/oracle/repo/WL5064/mysql-next-mr-wl2540-commit/ based
> on
>>> revid:aelkin@stripped
>>>
>>> 3176 Serge Kozlov 2010-10-20
>>> WL#5064 Emulate the corruption of events for replication at vary stages:
> reading from binlog/relay log, receiving by
>>> slave from network
>>>
>>> added:
>>> mysql-test/suite/rpl/r/rpl_corruption.result
>>> mysql-test/suite/rpl/t/rpl_corruption-master.opt
>>> mysql-test/suite/rpl/t/rpl_corruption-slave.opt
>>> mysql-test/suite/rpl/t/rpl_corruption.test
>>> modified:
>>> sql/binlog.cc
>>> sql/log_event.cc
>>> sql/mysqld.cc
>>> sql/rpl_slave.cc
>>> === added file 'mysql-test/suite/rpl/r/rpl_corruption.result'
>>> --- a/mysql-test/suite/rpl/r/rpl_corruption.result 1970-01-01 00:00:00 +0000
>>> +++ b/mysql-test/suite/rpl/r/rpl_corruption.result 2010-10-20 11:41:42 +0000
>>> @@ -0,0 +1,27 @@
>>> +stop slave;
>>> +drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
>>> +reset master;
>>> +reset slave;
>>> +drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
>>> +start slave;
>>> +call mtr.add_suppression('Found invalid event in binary log');
>>> +call mtr.add_suppression('Slave I/O: Relay log write failure: could not
> queue event from master');
>>> +call mtr.add_suppression('Replication event checksum verification failed');
>>> +include/stop_slave.inc
>>> +CREATE TABLE t1 (a INT NOT NULL PRIMARY KEY, b VARCHAR(10), c TEXT);
>>> +* insert/update/delete rows in table t1 *
>>> +SET GLOBAL debug="+d,corrupt_read_log_event";
>>> +SHOW BINLOG EVENTS;
>>> +ERROR HY000: Error when executing command SHOW BINLOG EVENTS: Wrong offset
> or I/O error
>>> +SET GLOBAL debug="+d,corrupt_queue_event";
>>> +START SLAVE;
>>> +Last_IO_Errno: 1595
>>> +SET GLOBAL debug="-d,corrupt_queue_event";
>>> +SET GLOBAL debug="+d,corrupt_read_log_event";
>>> +START SLAVE;
>>> +Last_SQL_Errno: 1594
>>> +SET GLOBAL debug="-d,corrupt_read_log_event";
>>> +include/start_slave.inc
>>> +SET GLOBAL debug="";
>>> +DROP TABLE t1;
>>> +SET GLOBAL debug="";
>>>
>>> === added file 'mysql-test/suite/rpl/t/rpl_corruption-master.opt'
>>> --- a/mysql-test/suite/rpl/t/rpl_corruption-master.opt 1970-01-01 00:00:00
> +0000
>>> +++ b/mysql-test/suite/rpl/t/rpl_corruption-master.opt 2010-10-20 11:41:42
> +0000
>>> @@ -0,0 +1 @@
>>> +--binlog-checksum=CRC32 --master-verify-checksum=1
>>>
>>> === added file 'mysql-test/suite/rpl/t/rpl_corruption-slave.opt'
>>> --- a/mysql-test/suite/rpl/t/rpl_corruption-slave.opt 1970-01-01 00:00:00
> +0000
>>> +++ b/mysql-test/suite/rpl/t/rpl_corruption-slave.opt 2010-10-20 11:41:42
> +0000
>>> @@ -0,0 +1 @@
>>> +--binlog-checksum=CRC32 --slave-sql-verify-checksum=1
>>>
>>> === added file 'mysql-test/suite/rpl/t/rpl_corruption.test'
>>> --- a/mysql-test/suite/rpl/t/rpl_corruption.test 1970-01-01 00:00:00 +0000
>>> +++ b/mysql-test/suite/rpl/t/rpl_corruption.test 2010-10-20 11:41:42 +0000
>>> @@ -0,0 +1,85 @@
>>> +############################################################
>>> +# Author: Serge Kozlov<serge.kozlov@stripped>
>>> +# Date: 17 Oct 2010
>>> +# Purpose: WL#5064 Testing with corrupted events.
>>> +# The test emulates the corruption at the vary stages
>>> +# of replication:
>>> +# - in binlog file
>>> +# - in network
>>> +# - in relay log
>>> +############################################################
>>> +
>>> +--source include/have_debug.inc
>>> +--source include/master-slave.inc
>>> +
>>> +call mtr.add_suppression('Found invalid event in binary log');
>>> +call mtr.add_suppression('Slave I/O: Relay log write failure: could not
> queue event from master');
>>> +call mtr.add_suppression('Replication event checksum verification failed');
>>> +
>>> +# Stop slave
>>> +--connection slave
>>> +--source include/stop_slave.inc
>>> +
>>> +# Create test table with data
>>> +--connection master
>>> +CREATE TABLE t1 (a INT NOT NULL PRIMARY KEY, b VARCHAR(10), c TEXT);
>>> +--echo * insert/update/delete rows in table t1 *
>>> +# Corruption algorithm modifies only the first event and
>>> +# then will be reset. To avoid checking always the first event
>>> +# from binlog (usually it is FD) we randomly execute different
>>> +# statements against server and call SHOW BINLOG EVENTS for
>>> +# last event from binlog. It provides different types of event
>>> +# for corruption.
>>> +--disable_query_log
>>> +let $i=`SELECT 1+CEILING(10*RAND())`;
>>> +let $j=1;
>>> +while ($i) {
>>> + let $pos= query_get_value(SHOW MASTER STATUS, Position, 1);
>>> + eval INSERT INTO t1 VALUES ($j, 'a', NULL);
>>> + if (`SELECT RAND()> 0.7`)
>>> + {
>>> + let $pos= query_get_value(SHOW MASTER STATUS, Position, 1);
>>> + eval UPDATE t1 SET c = REPEAT('a', 20) WHERE a = $j;
>>> + }
>>> + if (`SELECT RAND()> 0.8`)
>>> + {
>>> + let $pos= query_get_value(SHOW MASTER STATUS, Position, 1);
>>> + eval DELETE FROM t1 WHERE a = $j;
>>> + }
>>> + dec $i;
>>> + inc $j;
>>> +}
>>> +--enable_query_log
>>> +# Emulate corruption in binlog file
>>> +SET GLOBAL debug="+d,corrupt_read_log_event";
>>> +--echo SHOW BINLOG EVENTS;
>>> +--disable_query_log
>>> +send_eval SHOW BINLOG EVENTS FROM $pos;
>>> +--enable_query_log
>>> +--error ER_ERROR_WHEN_EXECUTING_COMMAND
>>> +reap;
>>> +
>>> +# Emulate corruption in network
>>> +--connection slave
>>> +SET GLOBAL debug="+d,corrupt_queue_event";
>>> +START SLAVE;
>>> +--source include/wait_for_slave_io_to_stop.inc
>>> +let $io_error= query_get_value(SHOW SLAVE STATUS, Last_IO_Errno, 1);
>>> +echo Last_IO_Errno: $io_error;
>>> +SET GLOBAL debug="-d,corrupt_queue_event";
>>> +
>>> +# Emulate corruption in relay log
>>> +SET GLOBAL debug="+d,corrupt_read_log_event";
>>> +START SLAVE;
>>> +--source include/wait_for_slave_sql_to_stop.inc
>>> +let $sql_error= query_get_value(SHOW SLAVE STATUS, Last_SQL_Errno, 1);
>>> +echo Last_SQL_Errno: $sql_error;
>>> +SET GLOBAL debug="-d,corrupt_read_log_event";
>>> +
>>> +# Clean up
>>> +--source include/start_slave.inc
>>> +--connection master
>>> +SET GLOBAL debug="";
>>> +DROP TABLE t1;
>>> +--sync_slave_with_master
>>> +SET GLOBAL debug="";
>>>
>>> === modified file 'sql/binlog.cc'
>>> --- a/sql/binlog.cc 2010-09-28 15:42:48 +0000
>>> +++ b/sql/binlog.cc 2010-10-20 11:41:42 +0000
>>> @@ -1267,7 +1267,7 @@ bool show_binlog_events(THD *thd, MYSQL_
>>> Rotate then Format_desc).
>>> */
>>> ev= Log_event::read_log_event(&log, (mysql_mutex_t*)0,
> description_event,
>>> - opt_master_verify_checksum);
>>> + 0);
>>> if (ev)
>>> {
>>> if (ev->get_type_code() == FORMAT_DESCRIPTION_EVENT)
>>>
>>> === modified file 'sql/log_event.cc'
>>> --- a/sql/log_event.cc 2010-09-28 15:42:48 +0000
>>> +++ b/sql/log_event.cc 2010-10-20 11:41:42 +0000
>>> @@ -1358,7 +1358,6 @@ Log_event* Log_event::read_log_event(con
>>> DBUG_ASSERT(description_event != 0);
>>> DBUG_PRINT("info", ("binlog_version: %d",
> description_event->binlog_version));
>>> DBUG_DUMP("data", (unsigned char*) buf, event_len);
>>> -
>>> /* Check the integrity */
>>> if (event_len< EVENT_LEN_OFFSET ||
>>> buf[EVENT_TYPE_OFFSET]>= ENUM_END_EVENT ||
>>> @@ -1393,7 +1392,17 @@ Log_event* Log_event::read_log_event(con
>>> */
>>> alg= (event_type != FORMAT_DESCRIPTION_EVENT) ?
>>> description_event->checksum_alg : get_checksum_alg(buf, event_len);
>>> -
>>> + // Emulate the corruption during reading an event
>>> + DBUG_EXECUTE_IF("corrupt_read_log_event",
>>> + if (crc_check)
>>> + {
>>> + char *debug_event_buf_c = (char *)buf;
>>> + int debug_cor_pos = rand() % event_len;
>>> + debug_event_buf_c[debug_cor_pos] =~ debug_event_buf_c[debug_cor_pos];
>>> + DBUG_PRINT("info", ("Corrupt the event at
> Log_event::read_log_event(char*,...): byte on position %d", debug_cor_pos));
>>> + DBUG_SET("-d,corrupt_read_log_event");
>>> + }
>>> + );
>>> if (crc_check&&
>>> event_checksum_test((uchar *) buf, event_len, alg))
>>> {
>>>
>>> === modified file 'sql/mysqld.cc'
>>> --- a/sql/mysqld.cc 2010-09-28 15:06:04 +0000
>>> +++ b/sql/mysqld.cc 2010-10-20 11:41:42 +0000
>>> @@ -4560,6 +4560,7 @@ int mysqld_main(int argc, char **argv)
>>>
>>> #ifndef DBUG_OFF
>>> test_lc_time_sz();
>>> + srand(time(NULL));
>>> #endif
>>>
>>> /*
>>>
>>> === modified file 'sql/rpl_slave.cc'
>>> --- a/sql/rpl_slave.cc 2010-09-28 15:06:04 +0000
>>> +++ b/sql/rpl_slave.cc 2010-10-20 11:41:42 +0000
>>> @@ -4343,6 +4343,14 @@ static int queue_event(Master_info* mi,c
>>> DBUG_ASSERT(mi->rli.relay_log.relay_log_checksum_alg !=
>>> BINLOG_CHECKSUM_ALG_UNDEF);
>>>
>>> + // Emulate the network corruption
>>> + DBUG_EXECUTE_IF("corrupt_queue_event",
>>> + char *debug_event_buf_c = (char*) buf;
>>> + int debug_cor_pos = rand() % event_len;
>>> + debug_event_buf_c[debug_cor_pos] =~ debug_event_buf_c[debug_cor_pos];
>>> + DBUG_PRINT("info", ("Corrupt the event at queue_event: byte on position
> %d", debug_cor_pos));
>>> + DBUG_SET("-d,corrupt_queue_event");
>>> + );
>>> if (event_checksum_test((uchar *) buf, event_len, checksum_alg))
>>> {
>>> error= ER_NETWORK_READ_EVENT_CHECKSUM_FAILURE;
>>>
>>>
>>>
>>>
>>>
>
>

Thread
bzr commit into mysql-next-mr branch (Serge.Kozlov:3176) WL#5064Serge Kozlov20 Oct
  • Re: bzr commit into mysql-next-mr branch (Serge.Kozlov:3176) WL#5064Luís Soares22 Oct
    • Re: bzr commit into mysql-next-mr branch (Serge.Kozlov:3176) WL#5064Serge.Kozlov24 Oct
      • Re: bzr commit into mysql-next-mr branch (Serge.Kozlov:3176) WL#5064Luís Soares26 Oct
        • Re: bzr commit into mysql-next-mr branch (Serge.Kozlov:3176) WL#5064Serge.Kozlov27 Oct
Re: bzr commit into mysql-next-mr branch (Serge.Kozlov:3176) WL#5064Luís Soares28 Oct