| List: | Commits | « Previous MessageNext Message » | |
| From: | alfranio correia | Date: | November 12 2008 12:23am |
| Subject: | Re: bzr commit into mysql-5.0-bugteam branch (alfranio.correia:2705) Bug#40337 | ||
| View as plain text | |||
Hi Jasonh, Sorry for the delayed answer. I will take your comments into consideration when producing the new patch. If you take a look at WL#4621, you will find an answer for your concern. Thanks. He Zhenxing wrote: > Hi, > > I have the following comments and concerns about the idea of your patch: > > 1) Your patch will discard the relay log when ever the slave is > restarted regardless of there was a server crash or not, which can > result in many unnecessary relay log retrievals. I would suggest using > some kind of server crash or relay log crash mechanisms to only activate > this recovery when possible relay log crashes detected. > > 2) The recovery of your patch dependent on setting > --sync-relay-log-info=1, so I think it make sense to make the option > --relay-log-recovery implies --sync-relay-log-info=1, which would make > it easier to use by users. Another optional solution is to warning the > user if --sync-relay-log-info=1 is not configure when > --relay-log-recovery is enabled. I'd prefer the first solution, but I'm > also OK with the second one. > > 3) Try to reduce the relay logs need to be retrieved, always fsync the > relay log file when rotating to a new file, by this way, we can be sure > that only the last relay log file can be corrupted, so there's no need > to retrieve logs before the events in the last relay log. > > 4) Change the value of --relay-log-recovery to string, and the value is > the name of the recovery policy, and we can name the policy your patch > introduced 'retrieve' or 'discard-and-retrieve' or something alike, and > we maybe later to add new policies to do the relay-log recovery. Or > maybe we can go even further and name this option --slave-recovery, and > we can later add recovery policies for both SQL and IO thread. > > 1) and 2) are things that I think have to be fixed before pushing, 3) > and 4) are things that I think we'd better have. > > Please don't hesitate to argue! > > > There are also some inline comments regarding of the code: > > Alfranio Correia wrote: > >> #At > file:///home/acorreia/workspace.sun/playground.mysql/mysql-5.0.68-custom_build/ >> >> 2705 Alfranio Correia 2008-10-29 >> Introduced a recovery mechanism that throws away relay-log.bin* retrieved >> from a master in case of a crash and updates the master.info based on >> the relay.info. This is done because such files may be corrupted after >> a crash. We are assuming that relay.info is fsynced per transaction. >> See BUG#40337. >> > > Put the bug number and bug title as the first line of the commit > message: > > BUG#40337 Fsyncing master and relay log to disk after every event is too > slow > > Description of the bug and the solution by this patch. > > >> modified: >> mysql-test/r/rpl_flush_log_loop.result >> sql/mysql_priv.h >> sql/mysqld.cc >> sql/set_var.cc >> sql/slave.cc >> >> per-file messages: >> mysql-test/r/rpl_flush_log_loop.result >> Fixed a test case broken after creating the patch. >> sql/mysql_priv.h >> Created parameter --relay-log-recovery, default = 0 thus keeping the original > behavior. >> sql/mysqld.cc >> Created parameter --relay-log-recovery, default = 0 thus keeping the original > behavior. >> sql/set_var.cc >> Created parameter --relay-log-recovery, default = 0 thus keeping the original > behavior. >> sql/slave.cc >> Provided a recovery mechanism for the master.info and relay-log.bin* based on > the relay.info. >> === modified file 'mysql-test/r/rpl_flush_log_loop.result' >> --- a/mysql-test/r/rpl_flush_log_loop.result 2007-10-10 07:21:11 +0000 >> +++ b/mysql-test/r/rpl_flush_log_loop.result 2008-10-29 16:37:16 +0000 >> @@ -10,6 +10,7 @@ relay_log MYSQLTEST_VARDIR/master-data/r >> relay_log_index >> relay_log_info_file relay-log.info >> relay_log_purge ON >> +relay_log_recovery OFF >> relay_log_space_limit 0 >> stop slave; >> change master to master_host='127.0.0.1',master_user='root', >> >> === modified file 'sql/mysql_priv.h' >> --- a/sql/mysql_priv.h 2008-10-29 12:27:59 +0000 >> +++ b/sql/mysql_priv.h 2008-10-29 16:37:16 +0000 >> @@ -1314,7 +1314,7 @@ extern ulong expire_logs_days; >> extern uint sync_binlog_period, sync_relaylog_period; >> extern ulong opt_tc_log_size, tc_log_max_pages_used, tc_log_page_size; >> extern ulong tc_log_page_waits; >> -extern my_bool relay_log_purge, opt_innodb_safe_binlog, opt_innodb; >> +extern my_bool relay_log_purge, relay_log_recovery, opt_innodb_safe_binlog, > opt_innodb; >> > > use a seperate line when adding new variables, this can reduce > unnecessary merge conflicts > > >> extern uint test_flags,select_errors,ha_open_options; >> extern uint protocol_version, mysqld_port, dropping_tables; >> extern uint delay_key_write_options, lower_case_table_names; >> >> === modified file 'sql/mysqld.cc' >> --- a/sql/mysqld.cc 2008-10-29 12:27:59 +0000 >> +++ b/sql/mysqld.cc 2008-10-29 16:37:16 +0000 >> @@ -404,7 +404,7 @@ ulong opt_ndb_cache_check_time; >> const char *opt_ndb_mgmd; >> ulong opt_ndb_nodeid; >> #endif >> -my_bool opt_readonly, use_temp_pool, relay_log_purge; >> +my_bool opt_readonly, use_temp_pool, relay_log_purge, relay_log_recovery; >> > > Same here, use a seperate line for 'relay_log_recovery' > > >> my_bool opt_sync_frm, opt_allow_suspicious_udfs; >> my_bool opt_secure_auth= 0; >> char* opt_secure_file_priv= 0; >> @@ -4919,6 +4919,7 @@ enum options_mysqld >> OPT_QUERY_CACHE_TYPE, OPT_QUERY_CACHE_WLOCK_INVALIDATE, OPT_RECORD_BUFFER, >> OPT_RECORD_RND_BUFFER, OPT_DIV_PRECINCREMENT, OPT_RELAY_LOG_SPACE_LIMIT, >> OPT_RELAY_LOG_PURGE, >> + OPT_RELAY_LOG_RECOVERY, >> OPT_SLAVE_NET_TIMEOUT, OPT_SLAVE_COMPRESSED_PROTOCOL, OPT_SLOW_LAUNCH_TIME, >> OPT_SLAVE_TRANS_RETRIES, OPT_READONLY, OPT_DEBUGGING, >> OPT_SORT_BUFFER, OPT_TABLE_CACHE, >> @@ -6304,6 +6305,11 @@ The minimum value for this variable is 4 >> (gptr*) &relay_log_purge, >> (gptr*) &relay_log_purge, 0, GET_BOOL, NO_ARG, >> 1, 0, 1, 0, 1, 0}, >> + {"relay_log_recovery", OPT_RELAY_LOG_RECOVERY, >> + "Enables automatic relay log recovery right after the database startup, which > means that the IO Thread starts re-fetching from the master right after the last > transaction processed. Use this parameter together with the sync relay log.", >> > > I think the 'together with sync relay log' at the end of the upper line > should be 'together with sync relay log info', and I'd suggestion > 'together with --sync-relay-log-info=1'. > > >> + (gptr*) &relay_log_recovery, >> + (gptr*) &relay_log_recovery, 0, GET_BOOL, NO_ARG, >> + 0, 0, 1, 0, 1, 0}, >> {"relay_log_space_limit", OPT_RELAY_LOG_SPACE_LIMIT, >> "Maximum space to use for all relay logs.", >> (gptr*) &relay_log_space_limit, >> >> === modified file 'sql/set_var.cc' >> --- a/sql/set_var.cc 2008-10-29 12:27:59 +0000 >> +++ b/sql/set_var.cc 2008-10-29 16:37:16 +0000 >> @@ -323,6 +323,8 @@ sys_var_thd_ulong sys_div_precincrement( >> #ifdef HAVE_REPLICATION >> sys_var_bool_ptr sys_relay_log_purge("relay_log_purge", >> &relay_log_purge); >> +sys_var_bool_ptr sys_relay_log_recovery("relay_log_recovery", >> + &relay_log_recovery); >> #endif >> sys_var_long_ptr sys_rpl_recovery_rank("rpl_recovery_rank", >> &rpl_recovery_rank); >> @@ -734,6 +736,7 @@ sys_var *sys_variables[]= >> &sys_read_rnd_buff_size, >> #ifdef HAVE_REPLICATION >> &sys_relay_log_purge, >> + &sys_relay_log_recovery, >> #endif >> &sys_rpl_recovery_rank, >> &sys_safe_updates, >> @@ -1054,6 +1057,7 @@ struct show_var_st init_vars[]= { >> {"relay_log_index", (char*) &opt_relaylog_index_name, SHOW_CHAR_PTR}, >> {"relay_log_info_file", (char*) &relay_log_info_file, SHOW_CHAR_PTR}, >> {sys_relay_log_purge.name, (char*) &sys_relay_log_purge, > SHOW_SYS}, >> + {sys_relay_log_recovery.name, (char*) &sys_relay_log_recovery, > SHOW_SYS}, >> {"relay_log_space_limit", (char*) &relay_log_space_limit, > SHOW_LONGLONG}, >> #endif >> {sys_rpl_recovery_rank.name,(char*) &sys_rpl_recovery_rank, > SHOW_SYS}, >> >> === modified file 'sql/slave.cc' >> --- a/sql/slave.cc 2008-10-29 12:27:59 +0000 >> +++ b/sql/slave.cc 2008-10-29 16:37:16 +0000 >> @@ -67,6 +67,7 @@ static inline bool io_slave_killed(THD* >> static inline bool sql_slave_killed(THD* thd,RELAY_LOG_INFO* rli); >> static int count_relay_log_space(RELAY_LOG_INFO* rli); >> static int init_slave_thread(THD* thd, SLAVE_THD_TYPE thd_type); >> +static int init_recovery(MASTER_INFO* mi); >> static int safe_connect(THD* thd, MYSQL* mysql, MASTER_INFO* mi); >> static int safe_reconnect(THD* thd, MYSQL* mysql, MASTER_INFO* mi, >> bool suppress_warnings); >> @@ -174,6 +175,9 @@ int init_slave() >> if (server_id && !master_host && active_mi->host[0]) >> master_host= active_mi->host; >> >> + if (init_recovery(active_mi)) >> + goto err; >> + >> /* If server id is not set, start_slave_thread() will say it */ >> >> if (master_host && !opt_skip_slave_start) >> @@ -197,6 +201,73 @@ err: >> DBUG_RETURN(1); >> } >> >> +/* >> +* Updates the master info based on the information stored in the relay info and >> +* ignores relay logs previously retrieved by the IO thread starting fetching > again >> +* based on to the group_master_log_pos and group_master_log_name. >> +* >> +* There is no need for a mutex >> +* as the caller (i.e. init_slave) already has one acquired. >> +* >> +* The recovery is done just after the database startup which means that >> +* start slave and stop slave commands do not throw away any relay log >> +* retrieved in the meantime. >> +* >> +* In the feature, we can improve this routine in order to avoid throwing >> +* away logs that are safely stored in the disk. Eventually, the old >> +* relay logs will be purged by the normal purge mechanism. >> +* >> +* Specifically, >> +* >> +* 1 - mi->master_log_pos <-- rli->group_master_log_pos >> +* 2 - mi->master_log_name <-- rli->group_master_log_name >> +* 3 - It moves the relay info to the new relay log file, by >> +* rli->group_relay_log_pos <-- rli->group_relay_log_pos <-- > BIN_LOG_HEADER_SIZE; >> +* rli->group_relay_log_name, rli->group_relay_log_pos >> +* >> +* If there is an error, it returns (1), otherwise returns (0). >> +*/ >> +static int init_recovery(MASTER_INFO* mi) >> +{ >> + const char* errmsg=0; >> + DBUG_ENTER("init_recovery"); >> + >> + RELAY_LOG_INFO* rli = &mi->rli; >> + if (relay_log_recovery && rli->group_master_log_name[0]) >> + { >> + mi->master_log_pos = max(BIN_LOG_HEADER_SIZE, >> + rli->group_master_log_pos); >> + strmake(mi->master_log_name, rli->group_master_log_name, >> + sizeof(mi->master_log_name)-1); >> + >> + strmake(rli->group_relay_log_name, rli->relay_log.get_log_fname(), >> + sizeof(rli->group_relay_log_name)-1); >> + strmake(rli->event_relay_log_name, rli->relay_log.get_log_fname(), >> + sizeof(mi->rli.event_relay_log_name)-1); >> + >> + rli->group_relay_log_pos = rli->group_relay_log_pos = > BIN_LOG_HEADER_SIZE; >> + >> + if (init_relay_log_pos(rli, >> + rli->group_relay_log_name, >> + rli->group_relay_log_pos, >> + 0 /*no data lock*/, >> + &errmsg, 0)) >> + DBUG_RETURN(1); >> + >> + if (flush_master_info(mi, 0)) >> + { >> + sql_print_error("Failed to flush master info file"); >> + DBUG_RETURN(1); >> + } >> + if (flush_relay_log_info(rli)) >> + { >> + sql_print_error("Failed to flush master info file"); >> + DBUG_RETURN(1); >> + } >> + } >> + >> + DBUG_RETURN(0); >> +} >> >> static void free_table_ent(TABLE_RULE_ENT* e) >> { >> >> > > >
