List:Commits« Previous MessageNext Message »
From:Libing Song Date:January 6 2010 3:22am
Subject:Re: bzr commit into mysql-5.1-bugteam branch (Li-Bing.Song:3215)
Bug#28421
View as plain text  
Hi Luis,

On Tue, 2010-01-05 at 15:14 +0000, Luís Soares wrote:
> Nǐ hǎo ma? ;)
> 
> Libing, please, just find a few questions inline.
> 
> On Tue, 2010-01-05 at 18:27 +0800, Libing Song wrote:
> > On Wed, 2009-12-23 at 14:30 +0000, Luís Soares wrote:
> > > Hi Libing,
> > > 
> > 
> > >        Furthermore, I think you need to clarify what happens. For
> > >        instance, taking your test as an example, if I stop the
> > >        slave thread before master changing the index and then
> > >        restart it after the FLUSH LOGS, the test just hangs... Is
> > >        this expected behavior? Should this be documented? 
> > > 
> > >        The portion of the test that I changed:
> > > 
> > > 
> > > STOP SLAVE IO_THREAD;
> > > source include/wait_for_slave_io_to_stop.inc;
> > We can not do that.
> 
> Does this mean we need documentation effort?
> 
I mean If slave io thread is stopped before the manual change,
we will not reproduce the bug.
> > Manually changing index file is not affect a new staring slave.
> 
> But it should be on FLUSH LOGS in the master side. So what's the 
> impact on the slave? Is this something we should never do?
> 
There is no impact on the slave at all, if it has stopped before the
'FLUSH LOGS' and restarts after 'FLUSH LOGS'. When starting, slave sends
the first file name to master. master gets the position of the file by
calling mysql_bin_log.find_log_pos. So it always gets the right
position.

In contrast, all following binlog files' position are get by using the
prior pos + 1line(this is the root cause of the bug). So this bug
appears only when slave starts before 'FLUSH LOGS'.

> Cheers,
> Luís
> 
> > > # Now, all events has been replicate to slave. As current dump file
> > > # (master-bin.000002) is the last binlog file, so master is waiting for
> > > new
> > > # events.
> > > 
> > > connection master;
> > > # Delete './master-bin.000001' from index file.
> > > let $datadir= `SELECT @@DATADIR`;
> > > remove_file $datadir/master-bin.index;
> > > write_file $datadir/master-bin.index;
> > > ./master-bin.000002
> > > EOF
> > > # Now, only 1 entry in index file.
> > > # ./master-bin.000002
> > > 
> > > 
> > > # Generate master-bin.000003, but it is in the second line.
> > > FLUSH LOGS;
> > > # Now, 2 entries in index file.
> > > # ./master-bin.000002
> > > # ./master-bin.000003
> > > 
> > > -- connection slave
> > > START SLAVE IO_THREAD;
> > > source include/wait_for_slave_io_to_start.inc;
> > > -- connection master
> > > 
> > > 
> > > REQUESTS
> > > --------
> > >  n/a
> > > 
> > > SUGGESTIONS
> > > -----------
> > >  n/a
> > > 
> > > DETAILS 
> > > -------
> > > 
> > > On Thu, 2009-12-10 at 12:10 +0000, Li-Bing.Song@stripped wrote:
> > > > #At file:///home/anders/work/bzrwork/worktree2/mysql-5.1-bugteam/
> based on revid:li-bing.song@stripped
> > > > 
> > > >  3215 Li-Bing.Song@stripped	2009-12-10
> > > >       BUG #28421 Infinite loop on slave relay logs
> > > >       
> > > >       That, manually deleteing one or more entries from
> 'master-bin.index', will
> > > >       cause master infinitely loop to send one binlog file. 
> > > >       
> > > 
> > > suggestion: Maybe loose the 'That' ^, and just make:
> > >             "Manually deleting one or more ..."
> > > 
> > > >       When starting a dump session, master opens index file and search
> the binlog file
> > > >       which is being requested by the slave. The position of the
> binlog file in the
> > > >       index file is recorded. it will be used to find the next binlog
> file when current
> > > >       binlog file has been dumped completely. As only the position is
> used, it will can
> > > >       not get the correct file if some entries has been removed
> manually from the index file.
> > > 
> > > suggestion: It may not get the correct file
> > > 
> > > >       the master will reopen the current binlog file which has been
> dump completely
> > > 
> > > suggestion: s/has been dump/has dumped 
> > > 
> > > >       and redump it if it can not get the next binlog file's name from
> index file.
> > > >       It obviously is a logical error.
> > > >       
> > > >       
> > > >       Manually changing index file is a illegal action, so after this
> patch, master
> > > >       sends a fatal error to slave and close the dump session if a new
> binlog file
> > > >       has been generated and master can not get it from the index
> file.
> > > 
> > > See RC3.
> > > 
> > > >     added:
> > > >       mysql-test/suite/rpl/r/rpl_manual_change_index_file.result
> > > >       mysql-test/suite/rpl/t/rpl_manual_change_index_file.test
> > > >     modified:
> > > >       sql/sql_repl.cc
> > > > === added file
> 'mysql-test/suite/rpl/r/rpl_manual_change_index_file.result'
> > > > ---
> a/mysql-test/suite/rpl/r/rpl_manual_change_index_file.result	1970-01-01 00:00:00 +0000
> > > > +++
> b/mysql-test/suite/rpl/r/rpl_manual_change_index_file.result	2009-12-10 12:10:33 +0000
> > > > @@ -0,0 +1,25 @@
> > > > +stop slave;
> > > > +drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
> > > > +reset master;
> > > > +reset slave;
> > > > +drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
> > > > +start slave;
> > > > +FLUSH LOGS;
> > > > +CREATE TABLE t1(c1 INT);
> > > > +FLUSH LOGS;
> > > > +call mtr.add_suppression('Got fatal error 1236 from master when
> reading data from binary log: .*could not find next log');
> > > > +Last_IO_Error
> > > > +Got fatal error 1236 from master when reading data from binary log:
> 'could not find next log'
> > > > +CREATE TABLE t2(c1 INT);
> > > > +FLUSH LOGS;
> > > > +CREATE TABLE t3(c1 INT);
> > > > +FLUSH LOGS;
> > > > +CREATE TABLE t4(c1 INT);
> > > > +START SLAVE IO_THREAD;
> > > > +SHOW TABLES;
> > > > +Tables_in_test
> > > > +t1
> > > > +t2
> > > > +t3
> > > > +t4
> > > > +DROP TABLE t1, t2, t3, t4;
> > > > 
> > > > === added file
> 'mysql-test/suite/rpl/t/rpl_manual_change_index_file.test'
> > > > ---
> a/mysql-test/suite/rpl/t/rpl_manual_change_index_file.test	1970-01-01 00:00:00 +0000
> > > > +++
> b/mysql-test/suite/rpl/t/rpl_manual_change_index_file.test	2009-12-10 12:10:33 +0000
> > > > @@ -0,0 +1,69 @@
> > > > +source include/master-slave.inc;
> > > > +
> > > > +#
> > > > +# BUG#28421 Infinite loop on slave relay logs
> > > > +# 
> > > > +# That, manually deleteing one or more entries from
> 'master-bin.index', will
> > > > +# cause master infinitely loop to send one binlog file.
> > > > +# 
> > > > +# Manually changing index file is a illegal action, so when this
> happen, we
> > > > +# send a fatal error to slave and close the dump session.
> > > > +
> > > > +FLUSH LOGS;
> > > > +# Now, 2 entries in index file.
> > > > +# ./master-bin.000001
> > > > +# ./master-bin.000002
> > > > +
> > > > +CREATE TABLE t1(c1 INT);
> > > > +# Now, the current dump file(master-bin.000002) is the second line of
> index
> > > > +# file
> > > > +sync_slave_with_master;
> > > > +# Now, all events has been replicate to slave. As current dump file
> > > > +# (master-bin.000002) is the last binlog file, so master is waiting
> for new
> > > > +# events.
> > > > +
> > > > +connection master;
> > > > +# Delete './master-bin.000001' from index file.
> > > > +let $datadir= `SELECT @@DATADIR`;
> > > > +remove_file $datadir/master-bin.index;
> > > > +write_file $datadir/master-bin.index;
> > > > +./master-bin.000002
> > > > +EOF
> > > 
> > > RC1 and RC2 ^
> > > 
> > > > +# Now, only 1 entry in index file.
> > > > +# ./master-bin.000002
> > > > +
> > > > +# Generate master-bin.000003, but it is in the second line.
> > > > +FLUSH LOGS;
> > > > +# Now, 2 entries in index file.
> > > > +# ./master-bin.000002
> > > > +# ./master-bin.000003
> > > > +
> > > > +# Now, master know that new binlog file(master-bin.000003) has been
> generated.
> > > > +# It expects that the new binlog file is in third line of index file,
> but
> > > > +# there is no third line in index file. It is so strange that master
> sends an
> > > > +# error to slave. 
> > > > +call mtr.add_suppression('Got fatal error 1236 from master when
> reading data from binary log: .*could not find next log');
> > > > +connection slave;
> > > > +source include/wait_for_slave_io_to_stop.inc;
> > > > +let $last_error= query_get_value(SHOW SLAVE STATUS, Last_IO_Error,
> 1);
> > > > +echo Last_IO_Error;
> > > > +echo $last_error;
> > > > +
> > > > +connection master;
> > > > +CREATE TABLE t2(c1 INT);
> > > > +FLUSH LOGS;
> > > > +CREATE TABLE t3(c1 INT);
> > > > +FLUSH LOGS;
> > > > +CREATE TABLE t4(c1 INT);
> > > > +
> > > > +connection slave;
> > > > +START SLAVE IO_THREAD;
> > > > +source include/wait_for_slave_io_to_start.inc;
> > > > +
> > > > +connection master;
> > > > +sync_slave_with_master;
> > > > +SHOW TABLES;
> > > > +
> > > > +connection master;
> > > > +DROP TABLE t1, t2, t3, t4;
> > > > +source include/master-slave-end.inc;
> > > > 
> > > > === modified file 'sql/sql_repl.cc'
> > > > --- a/sql/sql_repl.cc	2009-09-18 08:20:29 +0000
> > > > +++ b/sql/sql_repl.cc	2009-12-10 12:10:33 +0000
> > > > @@ -711,11 +711,14 @@ impossible position";
> > > >  
> > > >        thd_proc_info(thd, "Finished reading one binlog; switching to
> next binlog");
> > > >        switch (mysql_bin_log.find_next_log(&linfo, 1)) {
> > > > -      case LOG_INFO_EOF:
> > > > -	loop_breaker = (flags & BINLOG_DUMP_NON_BLOCK);
> > > > -	break;
> > > >        case 0:
> > > >  	break;
> > > > +      case LOG_INFO_EOF:
> > > > +        if (mysql_bin_log.is_active(log_file_name))
> > > > +        {
> > > > +          loop_breaker = (flags & BINLOG_DUMP_NON_BLOCK);
> > > > +          break;
> > > > +        }
> > > >        default:
> > > >  	errmsg = "could not find next log";
> > > >  	my_errno= ER_MASTER_FATAL_ERROR_READING_BINLOG;
> > > > 
> > > 
> > > 
> > > 
> > 
> > 
> > -- 
> > Your Sincerely,
> > Libing Song
> > ==================================
> > MySQL Replication Team
> > Software Engineer
> > Certified (ISC)2 CISSP
> > 
> > Email : Li-Bing.Song@stripped
> > Skype : libing.song
> > MSN   : slb_database@stripped
> > Phone : +86 010-6505-4020 ext. 319
> > Mobile: +86 138-1144-2038
> > ==================================
> > 
> > 
> > 
> > -- 
> > MySQL Code Commits Mailing List
> > For list archives: http://lists.mysql.com/commits
> > To unsubscribe:    http://lists.mysql.com/commits?unsub=1
> > 
> 
> 


-- 
Your Sincerely,
Libing Song
==================================
MySQL Replication Team
Software Engineer
Certified (ISC)2 CISSP

Email : Li-Bing.Song@stripped
Skype : libing.song
MSN   : slb_database@stripped
Phone : +86 010-6505-4020 ext. 319
Mobile: +86 138-1144-2038
==================================


Thread
bzr commit into mysql-5.1-bugteam branch (Li-Bing.Song:3215) Bug#28421Li-Bing.Song10 Dec
  • Re: bzr commit into mysql-5.1-bugteam branch (Li-Bing.Song:3215)Bug#28421He Zhenxing11 Dec
    • Re: bzr commit into mysql-5.1-bugteam branch (Li-Bing.Song:3215)Bug#28421He Zhenxing15 Dec
  • Re: bzr commit into mysql-5.1-bugteam branch (Li-Bing.Song:3215)Bug#28421Luís Soares23 Dec
    • Re: bzr commit into mysql-5.1-bugteam branch (Li-Bing.Song:3215)Bug#28421Libing Song5 Jan
      • Re: bzr commit into mysql-5.1-bugteam branch (Li-Bing.Song:3215)Bug#28421Luís Soares5 Jan
        • Re: bzr commit into mysql-5.1-bugteam branch (Li-Bing.Song:3215)Bug#28421Libing Song6 Jan