MySQL Lists are EOL. Please join:

List:Commits« Previous MessageNext Message »
From:Alfranio Correia Date:September 22 2010 4:10pm
Subject:Re: bzr commit into mysql-5.1-bugteam branch (luis.soares:3489) Bug#55263
View as plain text  
Hi Luis,

Thank you for the feedback.
Please, add the explanation you wrote here to the code.

I would go for solution 1 and file a bug report and a worklog
to handle the refactory.

Cheers.

On 09/22/2010 02:41 PM, Luís Soares wrote:
> Hi Alfranio,
>    This is somewhat hard to trace/debug. That code is not very appealing
> (to say the least), so sorry if I miss something, or got something
> wrong!
>
> On 09/21/2010 09:04 PM, Alfranio Correia wrote:
>> Hi Luis,
>>
>> First of all, sorry for the delay.
>> Let's discuss this tomorrow in order to close this bug asap
>
> OK.
>
>> Excellent description on the problem but I still have some questions
>> on the decision you have made to fix the bug.
>
> OK.
>
>> 1 - Why don't you set the cur_log to a valid value before starting the
>> sql thread?
>
> The cur_log is set to a valid value befor the sql thread starts.
>
> It's the one set when CHANGE MASTER is issued. However, while the SQL
> thread moves processes the events and moves on to the next log,
> eventually it reaches the log that it was reading the first time it
> was stopped. Then, there is no my_b_seek done (as it happens in several
> other places in the code), before check_binlog_magic is called. See,
> next_event, especially this part:
>
> 4340       if (rli->relay_log.is_active(rli->linfo.log_file_name))
> 4341       {
> 4342 #ifdef EXTRA_DEBUG
> 4343         if (global_system_variables.log_warnings)
> 4344           sql_print_information("next log '%s' is currently active",
> 4345                                 rli->linfo.log_file_name);
> 4346 #endif
> 4347         rli->cur_log= cur_log= rli->relay_log.get_log_file();
> (...)  /*** There is no my_b_seek(cur_log, 0) from here to: ***/
> 4358         if (check_binlog_magic(cur_log,&errmsg))
> (...)
>
>> 2 - Why is this not repeatable when you stop and start the slave?
>
> Because, when the SQL thread is stopped and restarted the
> following may happen:
>
> 1. Log was hot at stop time and remains hot at restart
>
>     SQL thread reads again from hot_log (SQL thread was reading
>     from the active log when it was stopped and the very same log
>     is still active on SQL thread restart).
>
>     In this case, my_b_seek is performed on cur_log, while cur_log
>     points to relay_log.get_log_file();
>
>     Result: check_binlog_magic *succeeds*
>
> 2. Log was hot at stop time but got cold before restart
>
>     The log was hot when SQL thread stopped, but it is not anymore
>     when the SQL thread restarts.
>
>     In this case, the SQL thread reopens the log, using cache_buf,
>     ie, cur_log points to&cache_buf, and thence its coordinates
>     are reset.
>
>     Result: check_binlog_magic *succeeds*
>
> 3. Log was already cold at stop time
>
>     The log was not hot when the SQL thread stopped, and, of
>     course, it will not be hot when it restarts.
>
>     In this case, the SQL thread opens the cold log again, using
>     cache_buf, ie, cur_log points to&cache_buf, and thence its
>     coordinates are reset.
>
>     Result: check_binlog_magic *succeeds*
>
> 4. Log was hot at stop time, DBA changes to previous cold log and
>     restarts SQL thread
>
>     The log was hot when the SQL thread was stopped, but the user
>     changed the coordinates of the SQL thread to restart from a
>     previous cold log.
>
>     In this case, at start time, cur_log points to a cold log,
>     opened using&cache_buf as cache, and coordinates are
>     reset. However, as it moves on to the next logs, it will
>     eventually reach the hot log. If the hot log is the same at
>     the time the SQL thread was stopped, then coordinates were not
>     reset - the cur_log will point to relay_log.get_log_file(),
>     and not a freshly opened IO_CACHE through cache_buf,
>     triggering the assertion (see that in next_event, after
>     checking whether the log is hot or not there is no my_b_seek
>     before calling check_binlog_magic - see snippet above).
>
>     Result: check_binlog_magic *FAILS*
>
> So I considered two alternatives:
>
>    1. to deploy yet another my_b_seek in the snippet I pasted
>       above.
>
>    2. or make the my_b_seek at SQL (re)start time, always reset
>       the hot_log IO_CACHE, the cold logs will have to be
>       reopened, thence, they should be safe.
>
> I went for the second one, as I prefer not to had one more
> my_b_seek.
>
> Finally, I think this code needs to be restructured/refactored
> sometime in the comming future.
>
>> Finally, I have a requested. Please, add a better explanation on the
>> variables
>> cur_log, cur_log_fd, etc... It is very hard to understand why they are
>> defined
>> without using gdb and reading several pieces of the code.
>
> /**
>    Pointer that either points at relay_log.get_log_file() or
>    &rli->cache_buf, depending on whether the log is hot or there
>    was the need to open a cold relay_log.
> */
> cur_log : IO_CACHE*
>
> /**
>    File descriptor of the current read relay log.
>   */
> rli->cur_log_fd : File
>
> /**
>    IO_CACHE used when opening cold relay logs.
>   */
> rli->cache_buf : IO_CACHE
>
> /**
>    Gets the IO_CACHE* to the log file referenced at the super
>    class MYSQL_LOG.
>   */
> MYSQL_BIN_LOG::get_log_file
>
>
> cur_log is set in:
>
>   1. init_relay_log_pos
>
>      a) my_b_seek((rli->cur_log=rli->relay_log.get_log_file()), (off_t)0);
>         if we are reading from the binary log that the IO thread is using
>
>      b) rli->cur_log =&rli->cache_buf;
>         if we are reading from a cold binary log
>
>   2. reopen_relay_log
>
>      a) rli->cur_log=&rli->cache_buf;
>
>   3. next_event
>
>      a) rli->cur_log= cur_log= rli->relay_log.get_log_file()
>         if we are reading from the binary log that the IO thread is using
>
>
> Regards,
> Luís Soares
>
>> Cheers.
>>
>> On 07/23/2010 05:32 PM, Luis Soares wrote:
>>> #At
>>> file:///home/lsoares/Workspace/bzr/work/bugfixing/55263/mysql-5.1-bugteam/
>>> based on revid:ramil@stripped
>>>
>>>    3489 Luis Soares    2010-07-23
>>>         BUG#55263: assert in check_binlog_magic
>>>
>>>         The procedure for setting up a fake binary log, by changing the
>>>         relay log files manually, is run twice when we issue mtr with
>>>         --repeat=2. However, when setting it up for the second time,
>>>         neither the sql thread is reset nor the server is restarted. This
>>>         means that previous stale relay log IO_CACHE - from first run -
>>>         is left around. As a consequence, when the test is run for the
>>>         second time, the IO_CACHE for the relay log has position in file
>>>         different than zero, triggering the assertion.
>>>
>>>         We fix this by changing the call to my_b_seek at the very
>>>         beginning of init_master_info. Instead of doing a seek on the
>>>         cur_log we now do a seek on the currently active relay log. This
>>>         makes sure that even if the SQL thread had read the log while it
>>>         was hot in the past, the position will be set to 0 if it
>>>         continues hot when the SQL thread is restarted and switches back
>>>         to it. If, on the other hand, the log became cold while the SQL
>>>         was stopped and restarted, then when the thread switches back
>>> to it
>>>         the log will have to be reopened, thus the position will start
>>>         from the 0, meaning that there is no need to do a seek.
>>>
>>>       modified:
>>>         sql/rpl_mi.cc
>>> === modified file 'sql/rpl_mi.cc'
>>> --- a/sql/rpl_mi.cc    2010-05-24 16:43:27 +0000
>>> +++ b/sql/rpl_mi.cc    2010-07-23 16:32:55 +0000
>>> @@ -123,7 +123,9 @@ int init_master_info(Master_info* mi, co
>>>        */
>>>        if (thread_mask&   SLAVE_SQL)
>>>        {
>>> -      my_b_seek(mi->rli.cur_log, (my_off_t) 0);
>>> +      pthread_mutex_lock(mi->rli.relay_log.get_log_lock());
>>> +      my_b_seek(mi->rli.relay_log.get_log_file(), (my_off_t) 0);
>>> +      pthread_mutex_unlock(mi->rli.relay_log.get_log_lock());
>>>        }
>>>        DBUG_RETURN(0);
>>>      }
>>>
>>>
>>>
>>>
>>>
>>
>
>

Thread
bzr commit into mysql-5.1-bugteam branch (luis.soares:3489) Bug#55263Luis Soares23 Jul
  • Re: bzr commit into mysql-5.1-bugteam branch (luis.soares:3489) Bug#55263Alfranio Correia21 Sep
    • Re: bzr commit into mysql-5.1-bugteam branch (luis.soares:3489) Bug#55263Luís Soares22 Sep
      • Re: bzr commit into mysql-5.1-bugteam branch (luis.soares:3489) Bug#55263Alfranio Correia22 Sep
        • Re: bzr commit into mysql-5.1-bugteam branch (luis.soares:3489) Bug#55263Luís Soares24 Sep
          • Re: bzr commit into mysql-5.1-bugteam branch (luis.soares:3489) Bug#55263Alfranio Correia24 Sep