Hi Jason and Sinisa!
Here are my comments on the patch...
Just my few cents,
Mats Kindahl
He Zhenxing wrote:
> #At file:///media/sda3/work/mysql/bzrwork/b31665/5.1-rpl/
Just a note, to avoid misunderstandings: according to bug report, this
patch is to be for 6.0, not 5.1. I would also advice against
introducing a patch of this kind into 5.1 since it risks de-stabilizing
the server.
> 2611 He Zhenxing 2008-07-07
> BUG#31665 sync_binlog should cause relay logs to be synchronized
> BUG#35542 Add option to sync master and relay log to disk after every event
>
> Add sync_relay_log option to server, this option works for relay log
> the same as option sync_binlog for binlog. This option also synchronize
> master info to disk when set to non-zero value.
>
> Original patches from Sinisa and Mark, with some modifications
> modified:
> sql/log.cc
> sql/log.h
> sql/mysql_priv.h
> sql/mysqld.cc
> sql/rpl_mi.cc
> sql/rpl_rli.cc
> sql/sql_repl.cc
>
> per-file messages:
> sql/log.cc
> remove global sync_binlog_counter, it is now a member of MYSQL_BIN_LOG.
> initialize mysql_bin_log with the address of sync_binlog_period.
> call flush_and_sync when relay events are written.
> sql/log.h
> Add sync_counter and sync_period_ptr members to MYSQL_BIN_LOG
> sql/mysql_priv.h
> remove sync_binlog_counter, add sync_relaylog_period variable
> sql/mysqld.cc
> Add sync-relay-log option
> sql/rpl_mi.cc
> Synchronize master info if sync_relaylog_period is non-zero.
> sql/rpl_rli.cc
> Initialize MYSQL_BIN_LOG instance in Relay_log_info with the address of
> sync_relaylog_period
> sql/sql_repl.cc
> Add sync_relaylog_period system variable.
> === modified file 'sql/log.cc'
> --- a/sql/log.cc 2008-03-31 08:57:18 +0000
> +++ b/sql/log.cc 2008-07-07 08:51:09 +0000
> @@ -49,8 +49,7 @@
>
> LOGGER logger;
>
> -MYSQL_BIN_LOG mysql_bin_log;
> -ulong sync_binlog_counter= 0;
> +MYSQL_BIN_LOG mysql_bin_log(&sync_binlog_period);
Note: there is no mutex or other form of lock protecting this variable.
In the event that the ulong is split into several registers, the two
parts might be read and written separately.
> static bool test_if_number(const char *str,
> long *res, bool allow_wildcards);
> @@ -2337,9 +2336,10 @@ const char *MYSQL_LOG::generate_name(con
>
>
>
> -MYSQL_BIN_LOG::MYSQL_BIN_LOG()
> +MYSQL_BIN_LOG::MYSQL_BIN_LOG(ulong *sync_period)
> :bytes_written(0), prepared_xids(0), file_id(1), open_count(1),
> need_start_event(TRUE), m_table_map_version(0),
> + sync_period_ptr(sync_period),
> description_event_for_exec(0), description_event_for_queue(0)
> {
> /*
> @@ -3466,6 +3466,8 @@ bool MYSQL_BIN_LOG::append(Log_event* ev
> }
> bytes_written+= ev->data_written;
> DBUG_PRINT("info",("max_size: %lu",max_size));
> + if (flush_and_sync())
> + goto err;
> if ((uint) my_b_append_tell(&log_file) > max_size)
> new_file_without_locking();
>
> @@ -3496,6 +3498,8 @@ bool MYSQL_BIN_LOG::appendv(const char*
> bytes_written += len;
> } while ((buf=va_arg(args,const char*)) && (len=va_arg(args,uint)));
> DBUG_PRINT("info",("max_size: %lu",max_size));
> + if (flush_and_sync())
> + goto err;
> if ((uint) my_b_append_tell(&log_file) > max_size)
> new_file_without_locking();
>
> @@ -3512,9 +3516,9 @@ bool MYSQL_BIN_LOG::flush_and_sync()
> safe_mutex_assert_owner(&LOCK_log);
> if (flush_io_cache(&log_file))
> return 1;
> - if (++sync_binlog_counter >= sync_binlog_period && sync_binlog_period)
> + if (*sync_period_ptr && ++sync_counter >= *sync_period_ptr)
> {
> - sync_binlog_counter= 0;
> + sync_counter= 0;
> err=my_sync(fd, MYF(MY_WME));
> }
> return err;
If the variable is written at the same time as it is read, this might
cause a flush to be missed or an extra flush to be executed. Thanks to
the fact that >= is used instead of ==, it will not cease flushing the
relay log in the case of a race condition. However, optimization might
change the code in unpredictable ways, so I would suggest adding a mutex
to protect the variable and add that as an argument to the MYSQL_BIN_LOG
constructor, or use the mutex associated with the
sys_var_sync_relaylog_period introduced below (which is
LOCK_global_system_variables).
An alternative would be to make the variable fully internal in the
MYSQL_BIN_LOG class, and then change its value through getters and
setters, which can then appropriately protect the variable from race
conditions by using a fully internal mutex or rwlock, whichever seems
most appropriate.
>
> === modified file 'sql/log.h'
> --- a/sql/log.h 2007-10-30 08:03:34 +0000
> +++ b/sql/log.h 2008-07-07 08:51:09 +0000
> @@ -262,6 +262,13 @@ class MYSQL_BIN_LOG: public TC_LOG, priv
>
> ulonglong m_table_map_version;
>
> + /* pointer to the sync period variable, for binlog this will be
> + sync_binlog_period, for relay log this will be
> + sync_relay_log_period
> + */
> + ulong *sync_period_ptr;
> + ulong sync_counter;
> +
> int write_to_file(IO_CACHE *cache);
> /*
> This is used to start writing to a new log file. The difference from
> @@ -285,7 +292,7 @@ public:
> Format_description_log_event *description_event_for_exec,
> *description_event_for_queue;
>
> - MYSQL_BIN_LOG();
> + MYSQL_BIN_LOG(ulong *sync_period);
> /*
> note that there's no destructor ~MYSQL_BIN_LOG() !
> The reason is that we don't want it to be automatically called
Uh, oh! .. not your code, but I haven't seen this before. There is
always a destructor, not specifying one means you get the "free" one
(default destructor).
>
> === modified file 'sql/mysql_priv.h'
> --- a/sql/mysql_priv.h 2008-03-31 07:40:39 +0000
> +++ b/sql/mysql_priv.h 2008-07-07 08:51:09 +0000
> @@ -1874,7 +1874,7 @@ extern ulong specialflag;
> #endif /* MYSQL_SERVER || INNODB_COMPATIBILITY_HOOKS */
> #ifdef MYSQL_SERVER
> extern ulong current_pid;
> -extern ulong expire_logs_days, sync_binlog_period, sync_binlog_counter;
> +extern ulong expire_logs_days, sync_binlog_period, sync_relaylog_period;
> extern ulong opt_tc_log_size, tc_log_max_pages_used, tc_log_page_size;
> extern ulong tc_log_page_waits;
> extern my_bool relay_log_purge, opt_innodb_safe_binlog, opt_innodb;
>
> === modified file 'sql/mysqld.cc'
> --- a/sql/mysqld.cc 2008-03-31 07:40:39 +0000
> +++ b/sql/mysqld.cc 2008-07-07 08:51:09 +0000
> @@ -505,7 +505,8 @@ ulong max_prepared_stmt_count;
> */
> ulong prepared_stmt_count=0;
> ulong thread_id=1L,current_pid;
> -ulong slow_launch_threads = 0, sync_binlog_period;
> +ulong slow_launch_threads = 0, sync_binlog_period= 0;
> +ulong sync_relaylog_period= 0;
> ulong expire_logs_days = 0;
> ulong rpl_recovery_rank=0;
> const char *log_output_str= "FILE";
> @@ -5523,7 +5524,8 @@ enum options_mysqld
> OPT_MIN_EXAMINED_ROW_LIMIT,
> OPT_LOG_SLOW_SLAVE_STATEMENTS,
> OPT_OLD_MODE,
> - OPT_SLAVE_EXEC_MODE
> + OPT_SLAVE_EXEC_MODE,
> + OPT_SYNC_RELAY_LOG,
> };
>
>
> @@ -6752,6 +6754,11 @@ The minimum value for this variable is 4
> "Use 0 (default) to disable synchronous flushing.",
> (uchar**) &sync_binlog_period, (uchar**) &sync_binlog_period, 0,
> GET_ULONG,
> REQUIRED_ARG, 0, 0, ULONG_MAX, 0, 1, 0},
> + {"sync-relay-log", OPT_SYNC_RELAY_LOG,
> + "Synchronously flush relay log to disk after every #th event. "
> + "Use 0 (default) to disable synchronous flushing.",
> + (uchar**) &sync_relaylog_period, (uchar**) &sync_relaylog_period, 0,
> GET_ULONG,
> + REQUIRED_ARG, 0, 0, ~0L, 0, 1, 0},
> {"sync-frm", OPT_SYNC_FRM, "Sync .frm to disk on create. Enabled by default.",
> (uchar**) &opt_sync_frm, (uchar**) &opt_sync_frm, 0, GET_BOOL, NO_ARG, 1,
> 0,
> 0, 0, 0, 0},
>
> === modified file 'sql/rpl_mi.cc'
> --- a/sql/rpl_mi.cc 2007-12-14 13:21:37 +0000
> +++ b/sql/rpl_mi.cc 2008-07-07 08:51:09 +0000
> @@ -342,6 +342,7 @@ int flush_master_info(Master_info* mi, b
> {
> IO_CACHE* file = &mi->file;
> char lbuf[22];
> + int err= 0;
>
> DBUG_ENTER("flush_master_info");
> DBUG_PRINT("enter",("master_pos: %ld", (long) mi->master_log_pos));
> @@ -388,7 +389,10 @@ int flush_master_info(Master_info* mi, b
> mi->password, mi->port, mi->connect_retry,
> (int)(mi->ssl), mi->ssl_ca, mi->ssl_capath, mi->ssl_cert,
> mi->ssl_cipher, mi->ssl_key, mi->ssl_verify_server_cert);
> - DBUG_RETURN(-flush_io_cache(file));
> + err= flush_io_cache(file);
> + if (sync_relaylog_period & !err)
> + err= my_sync(mi->fd, MYF(MY_WME));
> + DBUG_RETURN(-err);
> }
>
>
>
> === modified file 'sql/rpl_rli.cc'
> --- a/sql/rpl_rli.cc 2008-03-26 09:27:00 +0000
> +++ b/sql/rpl_rli.cc 2008-07-07 08:51:09 +0000
> @@ -32,7 +32,8 @@ int init_strvar_from_file(char *var, int
> Relay_log_info::Relay_log_info()
> :Slave_reporting_capability("SQL"),
> no_storage(FALSE), replicate_same_server_id(::replicate_same_server_id),
> - info_fd(-1), cur_log_fd(-1), save_temporary_tables(0),
> + info_fd(-1), cur_log_fd(-1), relay_log(&sync_relaylog_period),
> + save_temporary_tables(0),
> #if HAVE_purify
> is_fake(FALSE),
> #endif
>
> === modified file 'sql/sql_repl.cc'
> --- a/sql/sql_repl.cc 2008-02-19 12:58:08 +0000
> +++ b/sql/sql_repl.cc 2008-07-07 08:51:09 +0000
> @@ -1667,6 +1667,7 @@ static sys_var_long_ptr sys_slave_net_ti
> static sys_var_long_ptr sys_slave_trans_retries(&vars,
> "slave_transaction_retries",
> &slave_trans_retries);
> static sys_var_sync_binlog_period sys_sync_binlog_period(&vars, "sync_binlog",
> &sync_binlog_period);
> +static sys_var_sync_binlog_period sys_sync_relaylog_period(&vars,
> "sync_relaylog", &sync_relaylog_period);
> static sys_var_slave_skip_counter sys_slave_skip_counter(&vars,
> "sql_slave_skip_counter");
>
> static int show_slave_skip_errors(THD *thd, SHOW_VAR *var, char *buff);
> @@ -1759,7 +1760,7 @@ bool sys_var_slave_skip_counter::update(
>
> bool sys_var_sync_binlog_period::update(THD *thd, set_var *var)
> {
> - sync_binlog_period= (ulong) var->save_result.ulonglong_value;
> + *value= (ulong) var->save_result.ulonglong_value;
Assume that there are two threads modifying this variable at the same
time (say, to 0x8000 and 0x0040) as it is being read, and also that the
variable is split into two parts for some reason. We can then have the
following executions:
T1 T2
HIGH(value) := 0x80
HIGH(value) := 0x00
LOW(value) := 0x40
LOW(value) := 0x00
with the end result of a value == 0x0000
To update this value correctly, please use my_atomic_cas32 or add a
mutex. Note that this is just an example: there is no way to tell what
an optimizer decides is efficient code, so better code to avoid race
conditions.
> return 0;
> }
>
>
>
--
Mats Kindahl
Lead Software Developer
Replication Team
MySQL AB, www.mysql.com
| Thread |
|---|
| • bzr commit into mysql-5.1-rpl branch (hezx:2611) Bug#35542, Bug#31665 | He Zhenxing | 7 Jul |
| • Re: bzr commit into mysql-5.1-rpl branch (hezx:2611) Bug#35542, Bug#31665 | Mats Kindahl | 7 Jul |
| • Re: bzr commit into mysql-5.1-rpl branch (hezx:2611) Bug#35542,Bug#31665 | He Zhenxing | 8 Jul |
| • Re: bzr commit into mysql-5.1-rpl branch (hezx:2611) Bug#35542, Bug#31665 | Mats Kindahl | 8 Jul |
| • Re: bzr commit into mysql-5.1-rpl branch (hezx:2611)Bug#35542, Bug#31665 | He Zhenxing | 9 Jul |
| • Re: bzr commit into mysql-5.1-rpl branch (hezx:2611) Bug#35542,Bug#31665 | Sinisa Milivojevic | 9 Jul |
| • Re: bzr commit into mysql-5.1-rpl branch (hezx:2611) Bug#35542,Bug#31665 | Davi Arnaut | 9 Jul |
| • Re: bzr commit into mysql-5.1-rpl branch (hezx:2611) Bug#35542,Bug#31665 | He Zhenxing | 10 Jul |
| • Re: bzr commit into mysql-5.1-rpl branch (hezx:2611) Bug#35542,Bug#31665 | Sinisa Milivojevic | 11 Jul |
| • Re: bzr commit into mysql-5.1-rpl branch (hezx:2611) Bug#35542,Bug#31665 | He Zhenxing | 14 Jul |
| • Re: bzr commit into mysql-5.1-rpl branch (hezx:2611) Bug#35542, Bug#31665 | Mats Kindahl | 9 Jul |