List:Commits« Previous MessageNext Message »
From:Sven Sandberg Date:September 24 2010 11:59am
Subject:Re: bzr commit into mysql-5.1-bugteam branch (luis.soares:3515) Bug#55263
View as plain text  
Hi Luís,

Thanks for this work. I think the patch is good now, approved!

One remark though. I think this comment is really a great analysis that 
covers all cases, and it was needed to understand this bug. But I think 
it's a bit too big to include in the source code. I think it would be 
better to make this a changeset comment. Now that the seek is well 
encapsulated in the SQL thread's code, there is nothing strange in the 
code anymore that needs such a big motivation (it was a bit strange 
before, when the IO thread did a seek on behalf of the SQL thread).

Perhaps you can just leave a short note, like "We need to seek to 
position 0, because the read position may be non-zero if: (1) this log 
has been open before (e.g., because the SQL thread was reading from it, 
then the user issued 'CHANGE MASTER TO previous_log' and then the SQL 
thread caught up to this log again), and (2) the file was never closed 
(e.g., because the IO thread is still writing to it).".

So I think it would be ok to remove this comment. I know Alfranio 
requested it, so feel free to do as you like.

/Sven


Luis Soares wrote:
> #At file:///home/lsoares/Workspace/bzr/work/bugfixing/55263/mysql-5.1-bugteam/ based
> on revid:ingo.struewing@stripped
> 
>  3515 Luis Soares	2010-09-24
>       BUG#55263: assert in check_binlog_magic
>       
>       The procedure for setting up a fake binary log, by changing the
>       relay log files manually, is run twice when we issue mtr with
>       --repeat=2. However, when setting it up for the second time,
>       neither the sql thread is reset nor the server is restarted. This
>       means that previous stale relay log IO_CACHE metadata - from 
>       first run - is left around. As a consequence, when the test is 
>       run for the second time, the IO_CACHE for the relay log has 
>       position in file different than zero, triggering the assertion.
>                   
>       We fix this by deploying a call to my_b_seek before calling
>       check_binlog_magic in next_event. This prevents the server 
>       from asserting, in the cases that the SQL thread was reads
>       from a hot log (relay.NNNNN), then is stopped, then is restarted 
>       from a previous cold log (relay.MMMMM), and then it reaches 
>       the hot log relay.NNNNN again, in which case, the read 
>       coordinates are not set to zero, but to the old values.
>       
>       Additionally, some comments to the source code were added.
> 
>     modified:
>       sql/rpl_rli.h
>       sql/slave.cc
> === modified file 'sql/rpl_rli.h'
> --- a/sql/rpl_rli.h	2009-12-14 16:32:22 +0000
> +++ b/sql/rpl_rli.h	2010-09-24 09:44:53 +0000
> @@ -94,6 +94,16 @@ public:
>    */
>    MYSQL_BIN_LOG relay_log;
>    LOG_INFO linfo;
> +
> +  /*
> +   cur_log
> +     Pointer that either points at relay_log.get_log_file() or
> +     &rli->cache_buf, depending on whether the log is hot or there was
> +     the need to open a cold relay_log.
> +
> +   cache_buf 
> +     IO_CACHE used when opening cold relay logs.
> +   */
>    IO_CACHE cache_buf,*cur_log;
>  
>    /* The following variables are safe to read any time */
> 
> === modified file 'sql/slave.cc'
> --- a/sql/slave.cc	2010-07-20 18:07:36 +0000
> +++ b/sql/slave.cc	2010-09-24 09:44:53 +0000
> @@ -4349,12 +4349,66 @@ static Log_event* next_event(Relay_log_i
>          DBUG_ASSERT(rli->cur_log_fd == -1);
>  
>          /*
> -          Read pointer has to be at the start since we are the only
> -          reader.
> -          We must keep the LOCK_log to read the 4 first bytes, as this is a hot
> -          log (same as when we call read_log_event() above: for a hot log we
> -          take the mutex).
> +           When the SQL thread is [stopped and] (re)started the
> +           following may happen:
> +
> +           1. Log was hot at stop time and remains hot at restart
> +
> +              SQL thread reads again from hot_log (SQL thread was
> +              reading from the active log when it was stopped and the
> +              very same log is still active on SQL thread restart).
> +
> +              In this case, my_b_seek is performed on cur_log, while
> +              cur_log points to relay_log.get_log_file();
> +
> +           2. Log was hot at stop time but got cold before restart
> +
> +              The log was hot when SQL thread stopped, but it is not
> +              anymore when the SQL thread restarts.
> +
> +              In this case, the SQL thread reopens the log, using
> +              cache_buf, ie, cur_log points to &cache_buf, and thence
> +              its coordinates are reset.
> +
> +           3. Log was already cold at stop time
> +
> +              The log was not hot when the SQL thread stopped, and, of
> +              course, it will not be hot when it restarts.
> +
> +              In this case, the SQL thread opens the cold log again,
> +              using cache_buf, ie, cur_log points to &cache_buf, and
> +              thence its coordinates are reset.
> +
> +           4. Log was hot at stop time, DBA changes to previous cold
> +              log and restarts SQL thread
> +
> +              The log was hot when the SQL thread was stopped, but the
> +              user changed the coordinates of the SQL thread to
> +              restart from a previous cold log.
> +
> +              In this case, at start time, cur_log points to a cold
> +              log, opened using &cache_buf as cache, and coordinates
> +              are reset. However, as it moves on to the next logs, it
> +              will eventually reach the hot log. If the hot log is the
> +              same at the time the SQL thread was stopped, then
> +              coordinates were not reset - the cur_log will point to
> +              relay_log.get_log_file(), and not a freshly opened
> +              IO_CACHE through cache_buf. For this reason we need to
> +              deploy a my_b_seek before calling check_binlog_magic at
> +              this point of the code (see: BUG#55263 for more
> +              details).
> +          
> +          NOTES: 
> +            - We must keep the LOCK_log to read the 4 first bytes, as
> +              this is a hot log (same as when we call read_log_event()
> +              above: for a hot log we take the mutex).
> +
> +            - Because of scenario #4 above, we need to have a
> +              my_b_seek here. Otherwise, we might hit the assertion
> +              inside check_binlog_magic.
>          */
> +
> +        my_b_seek(cur_log, (my_off_t) 0);
>          if (check_binlog_magic(cur_log,&errmsg))
>          {
>            if (!hot_log) pthread_mutex_unlock(log_lock);
> 
> 
> 
> ------------------------------------------------------------------------
> 
> 

Thread
bzr commit into mysql-5.1-bugteam branch (luis.soares:3515) Bug#55263Luis Soares24 Sep
  • Re: bzr commit into mysql-5.1-bugteam branch (luis.soares:3515) Bug#55263Sven Sandberg24 Sep