MySQL Lists are EOL. Please join:

List:Commits« Previous MessageNext Message »
From:alfranio correia Date:November 11 2008 11:23pm
Subject:Re: bzr commit into mysql-5.0-bugteam branch (alfranio.correia:2705)
Bug#40337
View as plain text  
Hi Jasonh,

Sorry for the delayed answer.
I will take your comments into consideration when producing the new patch.

If you take a look at WL#4621, you will find an answer for your concern.

Thanks.


He Zhenxing wrote:
> Hi,
>
> I have the following comments and concerns about the idea of your patch:
>
> 1) Your patch will discard the relay log when ever the slave is
> restarted regardless of there was a server crash or not, which can
> result in many unnecessary relay log retrievals. I would suggest using
> some kind of server crash or relay log crash mechanisms to only activate
> this recovery when possible relay log crashes detected.
>
> 2) The recovery of your patch dependent on setting
> --sync-relay-log-info=1, so I think it make sense to make the option
> --relay-log-recovery implies --sync-relay-log-info=1, which would make
> it easier to use by users. Another optional solution is to warning the
> user if --sync-relay-log-info=1 is not configure when
> --relay-log-recovery is enabled. I'd prefer the first solution, but I'm
> also OK with the second one.
>
> 3) Try to reduce the relay logs need to be retrieved, always fsync the
> relay log file when rotating to a new file, by this way, we can be sure
> that only the last relay log file can be corrupted, so there's no need
> to retrieve logs before the events in the last relay log.
>
> 4) Change the value of --relay-log-recovery to string, and the value is
> the name of the recovery policy, and we can name the policy your patch
> introduced 'retrieve' or 'discard-and-retrieve' or something alike, and
> we maybe later to add new policies to do the relay-log recovery. Or
> maybe we can go even further and name this option --slave-recovery, and
> we can later add recovery policies for both SQL and IO thread.
>
> 1) and 2) are things that I think have to be fixed before pushing, 3)
> and 4) are things that I think we'd better have.
>
> Please don't hesitate to argue!
>
>
> There are also some inline comments regarding of the code:
>
> Alfranio Correia wrote:
>   
>> #At
> file:///home/acorreia/workspace.sun/playground.mysql/mysql-5.0.68-custom_build/
>>
>>  2705 Alfranio Correia	2008-10-29
>>       Introduced a recovery mechanism that throws away relay-log.bin* retrieved
>>       from a master in case of a crash and updates the master.info based on
>>       the relay.info. This is done because such files may be corrupted after
>>       a crash. We are assuming that relay.info is fsynced per transaction.
>>       See BUG#40337.
>>     
>
> Put the bug number and bug title as the first line of the commit
> message:
>
> BUG#40337 Fsyncing master and relay log to disk after every event is too
> slow
>
> Description of the bug and the solution by this patch.
>
>   
>> modified:
>>   mysql-test/r/rpl_flush_log_loop.result
>>   sql/mysql_priv.h
>>   sql/mysqld.cc
>>   sql/set_var.cc
>>   sql/slave.cc
>>
>> per-file messages:
>>   mysql-test/r/rpl_flush_log_loop.result
>>     Fixed a test case broken after creating the patch.
>>   sql/mysql_priv.h
>>     Created parameter --relay-log-recovery, default = 0 thus keeping the original
> behavior.
>>   sql/mysqld.cc
>>     Created parameter --relay-log-recovery, default = 0 thus keeping the original
> behavior.
>>   sql/set_var.cc
>>     Created parameter --relay-log-recovery, default = 0 thus keeping the original
> behavior.
>>   sql/slave.cc
>>     Provided a recovery mechanism for the master.info and relay-log.bin* based on
> the relay.info.
>> === modified file 'mysql-test/r/rpl_flush_log_loop.result'
>> --- a/mysql-test/r/rpl_flush_log_loop.result	2007-10-10 07:21:11 +0000
>> +++ b/mysql-test/r/rpl_flush_log_loop.result	2008-10-29 16:37:16 +0000
>> @@ -10,6 +10,7 @@ relay_log	MYSQLTEST_VARDIR/master-data/r
>>  relay_log_index	
>>  relay_log_info_file	relay-log.info
>>  relay_log_purge	ON
>> +relay_log_recovery	OFF
>>  relay_log_space_limit	0
>>  stop slave;
>>  change master to master_host='127.0.0.1',master_user='root',
>>
>> === modified file 'sql/mysql_priv.h'
>> --- a/sql/mysql_priv.h	2008-10-29 12:27:59 +0000
>> +++ b/sql/mysql_priv.h	2008-10-29 16:37:16 +0000
>> @@ -1314,7 +1314,7 @@ extern ulong expire_logs_days;
>>  extern uint sync_binlog_period, sync_relaylog_period;
>>  extern ulong opt_tc_log_size, tc_log_max_pages_used, tc_log_page_size;
>>  extern ulong tc_log_page_waits;
>> -extern my_bool relay_log_purge, opt_innodb_safe_binlog, opt_innodb;
>> +extern my_bool relay_log_purge, relay_log_recovery, opt_innodb_safe_binlog,
> opt_innodb;
>>     
>
> use a seperate line when adding new variables, this can reduce
> unnecessary merge conflicts
>
>   
>>  extern uint test_flags,select_errors,ha_open_options;
>>  extern uint protocol_version, mysqld_port, dropping_tables;
>>  extern uint delay_key_write_options, lower_case_table_names;
>>
>> === modified file 'sql/mysqld.cc'
>> --- a/sql/mysqld.cc	2008-10-29 12:27:59 +0000
>> +++ b/sql/mysqld.cc	2008-10-29 16:37:16 +0000
>> @@ -404,7 +404,7 @@ ulong opt_ndb_cache_check_time;
>>  const char *opt_ndb_mgmd;
>>  ulong opt_ndb_nodeid;
>>  #endif
>> -my_bool opt_readonly, use_temp_pool, relay_log_purge;
>> +my_bool opt_readonly, use_temp_pool, relay_log_purge, relay_log_recovery;
>>     
>
> Same here, use a seperate line for 'relay_log_recovery'
>
>   
>>  my_bool opt_sync_frm, opt_allow_suspicious_udfs;
>>  my_bool opt_secure_auth= 0;
>>  char* opt_secure_file_priv= 0;
>> @@ -4919,6 +4919,7 @@ enum options_mysqld
>>    OPT_QUERY_CACHE_TYPE, OPT_QUERY_CACHE_WLOCK_INVALIDATE, OPT_RECORD_BUFFER,
>>    OPT_RECORD_RND_BUFFER, OPT_DIV_PRECINCREMENT, OPT_RELAY_LOG_SPACE_LIMIT,
>>    OPT_RELAY_LOG_PURGE,
>> +  OPT_RELAY_LOG_RECOVERY,
>>    OPT_SLAVE_NET_TIMEOUT, OPT_SLAVE_COMPRESSED_PROTOCOL, OPT_SLOW_LAUNCH_TIME,
>>    OPT_SLAVE_TRANS_RETRIES, OPT_READONLY, OPT_DEBUGGING,
>>    OPT_SORT_BUFFER, OPT_TABLE_CACHE,
>> @@ -6304,6 +6305,11 @@ The minimum value for this variable is 4
>>     (gptr*) &relay_log_purge,
>>     (gptr*) &relay_log_purge, 0, GET_BOOL, NO_ARG,
>>     1, 0, 1, 0, 1, 0},
>> +  {"relay_log_recovery", OPT_RELAY_LOG_RECOVERY,
>> +   "Enables automatic relay log recovery right after the database startup, which
> means that the IO Thread starts re-fetching from the master right after the last
> transaction processed. Use this parameter together with the sync relay log.",
>>     
>
> I think the 'together with sync relay log' at the end of the upper line
> should be 'together with sync relay log info', and I'd suggestion
> 'together with --sync-relay-log-info=1'.
>
>   
>> +   (gptr*) &relay_log_recovery,
>> +   (gptr*) &relay_log_recovery, 0, GET_BOOL, NO_ARG,
>> +   0, 0, 1, 0, 1, 0},
>>    {"relay_log_space_limit", OPT_RELAY_LOG_SPACE_LIMIT,
>>     "Maximum space to use for all relay logs.",
>>     (gptr*) &relay_log_space_limit,
>>
>> === modified file 'sql/set_var.cc'
>> --- a/sql/set_var.cc	2008-10-29 12:27:59 +0000
>> +++ b/sql/set_var.cc	2008-10-29 16:37:16 +0000
>> @@ -323,6 +323,8 @@ sys_var_thd_ulong	sys_div_precincrement(
>>  #ifdef HAVE_REPLICATION
>>  sys_var_bool_ptr	sys_relay_log_purge("relay_log_purge",
>>                                              &relay_log_purge);
>> +sys_var_bool_ptr	sys_relay_log_recovery("relay_log_recovery",
>> +                                            &relay_log_recovery);
>>  #endif
>>  sys_var_long_ptr	sys_rpl_recovery_rank("rpl_recovery_rank",
>>  					      &rpl_recovery_rank);
>> @@ -734,6 +736,7 @@ sys_var *sys_variables[]=
>>    &sys_read_rnd_buff_size,
>>  #ifdef HAVE_REPLICATION
>>    &sys_relay_log_purge,
>> +  &sys_relay_log_recovery,
>>  #endif
>>    &sys_rpl_recovery_rank,
>>    &sys_safe_updates,
>> @@ -1054,6 +1057,7 @@ struct show_var_st init_vars[]= {
>>    {"relay_log_index", (char*) &opt_relaylog_index_name, SHOW_CHAR_PTR},
>>    {"relay_log_info_file", (char*) &relay_log_info_file, SHOW_CHAR_PTR},
>>    {sys_relay_log_purge.name,  (char*) &sys_relay_log_purge,        
> SHOW_SYS},
>> +  {sys_relay_log_recovery.name,  (char*) &sys_relay_log_recovery,  
> SHOW_SYS},
>>    {"relay_log_space_limit",  (char*) &relay_log_space_limit,       
> SHOW_LONGLONG},
>>  #endif
>>    {sys_rpl_recovery_rank.name,(char*) &sys_rpl_recovery_rank,      
> SHOW_SYS},
>>
>> === modified file 'sql/slave.cc'
>> --- a/sql/slave.cc	2008-10-29 12:27:59 +0000
>> +++ b/sql/slave.cc	2008-10-29 16:37:16 +0000
>> @@ -67,6 +67,7 @@ static inline bool io_slave_killed(THD* 
>>  static inline bool sql_slave_killed(THD* thd,RELAY_LOG_INFO* rli);
>>  static int count_relay_log_space(RELAY_LOG_INFO* rli);
>>  static int init_slave_thread(THD* thd, SLAVE_THD_TYPE thd_type);
>> +static int init_recovery(MASTER_INFO* mi);
>>  static int safe_connect(THD* thd, MYSQL* mysql, MASTER_INFO* mi);
>>  static int safe_reconnect(THD* thd, MYSQL* mysql, MASTER_INFO* mi,
>>  			  bool suppress_warnings);
>> @@ -174,6 +175,9 @@ int init_slave()
>>    if (server_id && !master_host && active_mi->host[0])
>>      master_host= active_mi->host;
>>  
>> +  if (init_recovery(active_mi))
>> +    goto err;
>> +
>>    /* If server id is not set, start_slave_thread() will say it */
>>  
>>    if (master_host && !opt_skip_slave_start)
>> @@ -197,6 +201,73 @@ err:
>>    DBUG_RETURN(1);
>>  }
>>  
>> +/*
>> +* Updates the master info based on the information stored in the relay info and
>> +* ignores relay logs previously retrieved by the IO thread starting fetching
> again
>> +* based on to the group_master_log_pos and group_master_log_name.
>> +*
>> +*  There is no need for a mutex 
>> +*  as the caller (i.e. init_slave) already has one acquired.
>> +*
>> +*  The recovery is done just after the database startup which means that
>> +*  start slave and stop slave commands do not throw away any relay log 
>> +*  retrieved in the meantime.
>> +*
>> +*  In the feature, we can improve this routine in order to avoid throwing
>> +*  away logs that are safely stored in the disk. Eventually, the old
>> +*  relay logs will be purged by the normal purge mechanism.
>> +*
>> +* Specifically,
>> +*
>> +*  1 - mi->master_log_pos  <-- rli->group_master_log_pos
>> +*  2 - mi->master_log_name <-- rli->group_master_log_name
>> +*  3 - It moves the relay info to the new relay log file, by
>> +*   rli->group_relay_log_pos <-- rli->group_relay_log_pos <--
> BIN_LOG_HEADER_SIZE;
>> +*   rli->group_relay_log_name, rli->group_relay_log_pos
>> +*
>> +* If there is an error, it returns (1), otherwise returns (0). 
>> +*/
>> +static int init_recovery(MASTER_INFO* mi)
>> +{
>> +  const char* errmsg=0;
>> +  DBUG_ENTER("init_recovery");
>> +
>> +  RELAY_LOG_INFO* rli = &mi->rli;
>> +  if (relay_log_recovery && rli->group_master_log_name[0])
>> +  {
>> +    mi->master_log_pos = max(BIN_LOG_HEADER_SIZE,
>> +                              rli->group_master_log_pos);
>> +    strmake(mi->master_log_name, rli->group_master_log_name,
>> +      sizeof(mi->master_log_name)-1);
>> +
>> +    strmake(rli->group_relay_log_name, rli->relay_log.get_log_fname(),
>> +            sizeof(rli->group_relay_log_name)-1);
>> +    strmake(rli->event_relay_log_name, rli->relay_log.get_log_fname(),
>> +            sizeof(mi->rli.event_relay_log_name)-1);
>> +
>> +    rli->group_relay_log_pos = rli->group_relay_log_pos =
> BIN_LOG_HEADER_SIZE;
>> +
>> +    if (init_relay_log_pos(rli,
>> +                           rli->group_relay_log_name,
>> +                           rli->group_relay_log_pos,
>> +                           0 /*no data lock*/,
>> +                           &errmsg, 0))
>> +      DBUG_RETURN(1);
>> +
>> +    if (flush_master_info(mi, 0))
>> +    {
>> +      sql_print_error("Failed to flush master info file");
>> +      DBUG_RETURN(1);
>> +    }
>> +    if (flush_relay_log_info(rli))
>> +    {
>> +       sql_print_error("Failed to flush master info file");
>> +       DBUG_RETURN(1);
>> +    }
>> +  }
>> +
>> +  DBUG_RETURN(0);
>> +}
>>  
>>  static void free_table_ent(TABLE_RULE_ENT* e)
>>  {
>>
>>     
>
>
>   

Thread
bzr commit into mysql-5.0-bugteam branch (alfranio.correia:2705) Bug#40337Alfranio Correia29 Oct
  • Re: bzr commit into mysql-5.0-bugteam branch (alfranio.correia:2705) Bug#40337Andrei Elkin30 Oct
    • Re: bzr commit into mysql-5.0-bugteam branch (alfranio.correia:2705)Bug#40337alfranio correia31 Oct
      • Re: bzr commit into mysql-5.0-bugteam branch (alfranio.correia:2705)Bug#40337alfranio correia31 Oct
      • Re: bzr commit into mysql-5.0-bugteam branch (alfranio.correia:2705) Bug#40337Andrei Elkin31 Oct
  • Re: bzr commit into mysql-5.0-bugteam branch (alfranio.correia:2705)Bug#40337He Zhenxing3 Nov
    • Re: bzr commit into mysql-5.0-bugteam branch (alfranio.correia:2705)Bug#40337alfranio correia12 Nov
Re: bzr commit into mysql-5.0-bugteam branch (alfranio.correia:2705)Bug#40337He Zhenxing15 Nov
  • Re: bzr commit into mysql-5.0-bugteam branch (alfranio.correia:2705)Bug#40337alfranio correia15 Nov
    • Re: bzr commit into mysql-5.0-bugteam branch (alfranio.correia:2705)Bug#40337He Zhenxing15 Nov