Hi Andrei!
Comments inline below.
Just my few cents,
Mats Kindahl
Andrei Elkin wrote:
> Below is the list of changes that have just been committed into a local
> 6.0 repository of aelkin. When aelkin does a push these changes
> will be propagated to the main repository and, within 24 hours after the
> push, to the public repository.
> For information on how to access the public repository
> see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html
>
> ChangeSet@stripped, 2008-05-08 21:00:35+03:00, aelkin@stripped +13 -0
> Bug #25998 problems about circle replication
> Bug #27808 Infinite looping in circular replication
>
> In case of withdrawing one of the servers from the circular multi-master
> replication group
> events generated by the removed server could become unstoppable (bug#25998).
> That's because the originator had been the terminator of the own event flow.
>
"circular multi-master replication group" ? We don't have multi-master.
If you mean something else, then please re-write this paragraph to
clarify the situation.
> Other possibility of the unstoppable event is the cluster replication (bug#27808).
> In that case an event generated by a member of the same cluster and replicated to
> another member
> got accepted despite effects of the event had been propagated across the cluster
> via cluster
> communications.
>
> Both artifacts are fixable with introducing a new option for CHANGE MASTER.
>
It's very hard to read that paragraph. I suggest something along the
line: "Another case can occur when using cluster replication
(BUG#27808). In that case, the cluster can generate an event that is
replicated to another member connected to the same cluster. Since the
server id's are different, the event will be executed and the change
applied to the cluster."
"Artifact" is not a good word in these situations, you should use
"problem" instead.
>
> MASTER_IGNORE_SERVER_ID= some_server_id
>
> can be repeated multiple times to list all the servers from which events would not
> be put into
> relay log and therefore would not be executed.
>
Why not have a comma-separated list of server ids?
>
> Fixed with implementing the feature.
>
> Properties of the feature:
>
> a. reporting an error if the id of an ignored server is the slave itself and
> its configuration on startup was with --replicate-same-server-id;
>
> b. overriding the existing MASTER_IGNORE_SERVER_ID list by the following
> CHANGE MASTER ... MASTER_IGNORE_SERVER_ID=;
> c. preserving the existing list by CHANGE MASTER w/o MASTER_IGNORE_SERVER_ID;
>
If you pass a comma-separated list, this complication of the logic goes
away.
> d. resetting the ignored server ids with RESET SLAVE;
>
RESET SLAVE does not change MASTER_HOST, nor any of the other
parameters, so it does not make sense that it should reset the list of
ignored server ids.
> e. extending SHOW SLAVE STATUS with a new line listing ignored servers;
>
OK.
> f. Differently from --replicate-same-server-id handling, the sql thread is not
> concerned with the ignored server ids, because it's supposed that
> the relay log constists only of events that can not be unstoppable.
> In order to guarantee that, e.g in case of the circular replication with a
> failing
> server DBA needs to change master necessarily using the new option.
>
OK.
> g. Rotate and FD events originated by the current master listed
> in the ignored list are still relay-logged which does not create
> any termination issue.
>
OK.
> h. The possible list of ignored servers is sorted for the fastest processing of
> filtering
> algorithm.
>
OK.
>
> Two new lines to show slave status output are added: the list of ignored servers
> and
> the current master server id.
>
OK.
>
> Use cases for this feature can be found on the bug report page.
>
> include/my_sys.h@stripped, 2008-05-08 21:00:32+03:00, aelkin@stripped +2
> -0
> search function for a specific type of dynamic array with long type items.
>
> mysql-test/suite/rpl/r/rpl_server_id_ignore.result@stripped, 2008-05-08 21:00:33+03:00,
> aelkin@stripped +44 -0
> new results
>
> mysql-test/suite/rpl/r/rpl_server_id_ignore.result@stripped, 2008-05-08 21:00:33+03:00,
> aelkin@stripped +0 -0
>
> mysql-test/suite/rpl/t/rpl_server_id_ignore-slave.opt@stripped, 2008-05-08
> 21:00:33+03:00, aelkin@stripped +1 -0
> deliberately requesting a clashing the new change-master option
> --replicate-same-server-id in order to show reaction.
>
Please rewrite this comment. It doesn't make any sense.
> mysql-test/suite/rpl/t/rpl_server_id_ignore-slave.opt@stripped, 2008-05-08
> 21:00:33+03:00, aelkin@stripped +0 -0
>
> mysql-test/suite/rpl/t/rpl_server_id_ignore.test@stripped, 2008-05-08 21:00:33+03:00,
> aelkin@stripped +107 -0
> The basic tests for the new feature.
>
> mysql-test/suite/rpl/t/rpl_server_id_ignore.test@stripped, 2008-05-08 21:00:33+03:00,
> aelkin@stripped +0 -0
>
> mysys/array.c@stripped, 2008-05-08 21:00:32+03:00, aelkin@stripped +37 -0
> Implementation of the search function.
> The third in-out arg is provided for the caller if it needs the index
> of the found search pattern.
>
"The third in-out arg"?! I cannot decode this without reading the code.
Give the function name, what parameter you added, and the purpose of
that parameter.
> sql/lex.h@stripped, 2008-05-08 21:00:32+03:00, aelkin@stripped +1 -0
> new keyword
>
What new keyword?
> sql/rpl_mi.cc@stripped, 2008-05-08 21:00:32+03:00, aelkin@stripped +67 -4
> initialization, storing into master info, reading from the file, deallocation
> of the list of ignored server id:s.
>
"initialization"? What did you do here?
"storing into master info"? Storing what into master info? What did you
do here?
"reading from the file"? What file? What did you do here?
> sql/rpl_mi.h@stripped, 2008-05-08 21:00:32+03:00, aelkin@stripped +4 -0
> a new method answering filtering question;
> a new member to hold the current master server id.
>
> sql/share/errmsg.txt@stripped, 2008-05-08 21:00:33+03:00, aelkin@stripped
> +2 -0
> a new error message about possible clashing of options.
>
> sql/slave.cc@stripped, 2008-05-08 21:00:32+03:00, aelkin@stripped +144
> -8
> a helper function to read an array of variable size of intergers.
> Refining io-thread's filtering condition to consult with the list of ignored
> servers.
> Note, that FD and Rotate events from an ignored server being the current master
> are still accepted.
> Filtering basing on --replicate-same-server-id remains as before; if the ignored
> servers
> list is empty
> Adding two new lines in show slave status.
>
This is very hard to read: could you please re-write that cset comment.
What helper function?
> sql/sql_lex.h@stripped, 2008-05-08 21:00:32+03:00, aelkin@stripped +1 -0
> parser time storage for variable size array of server ids
>
> sql/sql_repl.cc@stripped, 2008-05-08 21:00:32+03:00, aelkin@stripped +46
> -13
> aux function for sorting the ignored server_id:s;
> sorting of the ignored servers implementation;
> failing change master do not forget to free lex's dynamical storage with parsed
> ignored server_ids;
> resetting the old list of ignored server if the parser detected the new list;
> shifting some pieces of code in order to comply with the above logics;
>
> sql/sql_yacc.yy@stripped, 2008-05-08 21:00:32+03:00, aelkin@stripped +8
> -0
> allocation a dyn array for possible list of ignored server ids;
> filling the array;
> resetting (missed part of wl#342) heartbeat_opt at the beginning of
> CHANGE MASTER processing;
>
I don't follow why you need to reset the heartbeat option when doing
CHANGE MASTER and why you need to do this change in this patch.
> diff -Nrup a/include/my_sys.h b/include/my_sys.h
> --- a/include/my_sys.h 2008-04-01 13:56:27 +03:00
> +++ b/include/my_sys.h 2008-05-08 21:00:32 +03:00
> @@ -804,6 +804,8 @@ extern int get_index_dynamic(DYNAMIC_AR
> #define push_dynamic(A,B) insert_dynamic((A),(B))
> #define reset_dynamic(array) ((array)->elements= 0)
> #define sort_dynamic(A,cmp) my_qsort((A)->buffer, (A)->elements,
> (A)->size_of_element, (cmp))
> +extern my_bool dynamic_array_ulong_bin_search(DYNAMIC_ARRAY *a,
> + const ulong *pattern, uint * index);
>
> extern my_bool init_dynamic_string(DYNAMIC_STRING *str, const char *init_str,
> size_t init_alloc,size_t alloc_increment);
> diff -Nrup a/mysql-test/suite/rpl/r/rpl_server_id_ignore.result
> b/mysql-test/suite/rpl/r/rpl_server_id_ignore.result
> --- /dev/null Wed Dec 31 16:00:00 196900
> +++ b/mysql-test/suite/rpl/r/rpl_server_id_ignore.result 2008-05-08 21:00:33 +03:00
> @@ -0,0 +1,44 @@
> +stop slave;
> +drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
> +reset master;
> +reset slave;
> +drop table if exists t1,t2,t3,t4,t5,t6,t7,t8,t9;
> +start slave;
> +master_id: 1
> +stop slave;
> +*** --replicate-same-server-id and change master option can clash ***
> +change master to MASTER_IGNORE_SERVER_ID=2, MASTER_IGNORE_SERVER_ID=1;
> +ERROR HY000: The requested server id 2 clashes with the slave startup option
> --replicate-same-server-id
> +*** must be empty due to the error ***
> +ignore server id list:
> +change master to MASTER_IGNORE_SERVER_ID=10, MASTER_IGNORE_SERVER_ID=100;
> +*** must be 10, 100 ***
> +ignore server id list: 10, 100
> +reset slave;
> +*** must be empty due to reset slave ***
> +ignore server id list:
> +change master to MASTER_IGNORE_SERVER_ID=10, MASTER_IGNORE_SERVER_ID=100;
> +*** CHANGE MASTER with MASTER_IGNORE_SERVER_ID option overrides (does not increment)
> the previous setup ***
> +change master to MASTER_IGNORE_SERVER_ID=1, MASTER_IGNORE_SERVER_ID=3,
> MASTER_IGNORE_SERVER_ID=4;
> +*** must be 1, 3, 4 due to overriding policy ***
> +ignore server id list: 1, 3, 4
> +*** ignore master (server 1) queries for a while ***
> +start slave;
> +create table t1 (n int);
> +*** must be empty as the event is to be filtered out ***
> +show tables;
> +Tables_in_test
> +*** allowing events from master ***
> +stop slave;
> +reset slave;
> +*** must be empty due to reset slave ***
> +ignore server id list:
> +change master to master_host='127.0.0.1', master_user='root',
> +master_password='', master_port=MASTER_PORT;
> +start slave;
> +*** must have caught create table ***
> +show tables;
> +Tables_in_test
> +t1
> +drop table t1;
> +end of the tests
>
Very good.
> diff -Nrup a/mysql-test/suite/rpl/t/rpl_server_id_ignore-slave.opt
> b/mysql-test/suite/rpl/t/rpl_server_id_ignore-slave.opt
> --- /dev/null Wed Dec 31 16:00:00 196900
> +++ b/mysql-test/suite/rpl/t/rpl_server_id_ignore-slave.opt 2008-05-08 21:00:33
> +03:00
> @@ -0,0 +1 @@
> +--disable-log-slave-updates --replicate-same-server-id
> diff -Nrup a/mysql-test/suite/rpl/t/rpl_server_id_ignore.test
> b/mysql-test/suite/rpl/t/rpl_server_id_ignore.test
> --- /dev/null Wed Dec 31 16:00:00 196900
> +++ b/mysql-test/suite/rpl/t/rpl_server_id_ignore.test 2008-05-08 21:00:33 +03:00
> @@ -0,0 +1,107 @@
> +# This test checks that the slave ignores queries originating
> +# from the list of ignored originators (bug#27808 etc)
> +#
> +# phases of tests:
> +#
> +# 0. master_id new line in show slave status
> +# 1. syntax related:
> +# a. error reporting if options clash;
> +# b. overriding the old MASTER_IGNORE_SERVER_ID setup by the following
> +# CHANGE MASTER ... MASTER_IGNORE_SERVER_ID=;
> +# c. the old setup preserving by CHANGE MASTER w/o MASTER_IGNORE_SERVER_ID
> +# d. resetting the ignored server ids with RESET SLAVE
> +# 2. run time related:
> +# a. observing no processing events from a master listed in
> MASTER_IGNORE_SERVER_ID:s
> +# b. nullifying the list and resuming of taking binlog from the very beginning
> with
> +# executing events this time
> +
> +source include/master-slave.inc;
> +
> +connection slave;
> +source include/wait_for_slave_to_start.inc;
>
Is this necessary? Shouldn't this line be added to master-slave.inc
instead in that case?
> +# a new line for master_id
> +let $master_id= query_get_value(SHOW SLAVE STATUS, Master_Server_Id, 1);
> +echo master_id: $master_id;
> +
> +stop slave;
>
wait for slave to stop.
> +--echo *** --replicate-same-server-id and change master option can clash ***
> +--error ER_SLAVE_IGNORE_SERVER_ID
> +eval change master to MASTER_IGNORE_SERVER_ID=2, MASTER_IGNORE_SERVER_ID=1;
>
There are no variables here, so you don't need the "eval".
> +--echo *** must be empty due to the error ***
> +let $ignore_list= query_get_value(SHOW SLAVE STATUS, Replicate_Ignore_Server_id,
> 1);
> +echo ignore server id list: $ignore_list;
> +
> +eval change master to MASTER_IGNORE_SERVER_ID=10, MASTER_IGNORE_SERVER_ID=100;
> +--echo *** must be 10, 100 ***
> +let $ignore_list= query_get_value(SHOW SLAVE STATUS, Replicate_Ignore_Server_id,
> 1);
> +echo ignore server id list: $ignore_list;
> +reset slave;
> +--echo *** must be empty due to reset slave ***
> +let $ignore_list= query_get_value(SHOW SLAVE STATUS, Replicate_Ignore_Server_id,
> 1);
> +echo ignore server id list: $ignore_list;
> +eval change master to MASTER_IGNORE_SERVER_ID=10, MASTER_IGNORE_SERVER_ID=100;
>
No "eval" necessary.
> +--echo *** CHANGE MASTER with MASTER_IGNORE_SERVER_ID option overrides (does not
> increment) the previous setup ***
> +eval change master to MASTER_IGNORE_SERVER_ID=1, MASTER_IGNORE_SERVER_ID=3,
> MASTER_IGNORE_SERVER_ID=4;
>
No "eval" necessary.
> +--echo *** must be 1, 3, 4 due to overriding policy ***
> +let $ignore_list= query_get_value(SHOW SLAVE STATUS, Replicate_Ignore_Server_id,
> 1);
> +echo ignore server id list: $ignore_list;
> +--echo *** ignore master (server 1) queries for a while ***
> +start slave;
>
Wait for slave to start. There is a slight chance that you execute this
statement, switch to the master, and when doing sync_slave_with_master,
the master_pos_wait() will return NULL because the slave has not had a
chance to start.
> +
> +connection master;
> +
> +#connection slave;
> +sync_slave_with_master;
> +let $slave_relay_pos0= query_get_value(SHOW SLAVE STATUS, Relay_Log_Pos, 1);
> +
> +connection master;
> +create table t1 (n int);
> +let $master_binlog_end= query_get_value(SHOW MASTER STATUS, Position, 1);
> +
> +connection slave;
> +let $slave_param= Exec_Master_Log_Pos;
> +let $slave_param_value= $master_binlog_end;
> +source include/wait_for_slave_param.inc;
> +--echo *** must be empty as the event is to be filtered out ***
> +show tables;
> +--echo *** allowing events from master ***
> +let $slave_relay_pos1= query_get_value(SHOW SLAVE STATUS, Relay_Log_Pos, 1);
> +#
> +# checking stability of relay log pos
> +#
> +if (`select $slave_relay_pos1 - $slave_relay_pos0`)
> +{
> + --echo Error: relay log position changed: $slave_relay_pos0, $slave_relay_pos1
> + query_vertical show slave status;
>
This will print a lot of info that will cause the test to fail in
pushbuild. Use "include/show_slave_status.inc" or something equivalent
to filter out the pointless columns.
> +}
> +stop slave;
> +source include/wait_for_slave_to_stop.inc;
> +reset slave;
> +--echo *** must be empty due to reset slave ***
> +let $ignore_list= query_get_value(SHOW SLAVE STATUS, Replicate_Ignore_Server_id,
> 1);
> +echo ignore server id list: $ignore_list;
> +--replace_result $MASTER_MYPORT MASTER_PORT
> +eval change master to master_host='127.0.0.1', master_user='root',
> + master_password='', master_port=$MASTER_MYPORT;
>
> +start slave;
>
Wait for slave to start.
> +
> +connection master;
> +
> +#connection slave;
> +sync_slave_with_master;
> +--echo *** must have caught create table ***
> +show tables;
> +
> +# cleanup
> +connection master;
> +drop table t1;
> +#connection slave
> +sync_slave_with_master;
> +
> +--echo end of the tests
> +
> +
> +
> +
> +
> +
> diff -Nrup a/mysys/array.c b/mysys/array.c
> --- a/mysys/array.c 2007-10-11 18:07:33 +03:00
> +++ b/mysys/array.c 2008-05-08 21:00:32 +03:00
> @@ -382,3 +382,40 @@ int get_index_dynamic(DYNAMIC_ARRAY *arr
> return ret;
>
> }
> +
> +/**
> + Binary search in a specific type of the array having ulong items.
> + The items are ascendant-sorted.
> +
> + @param A pointer to an array
> + @param search pointer to a search pattern
> + @param index[in,out] pointer to a store for the index
> + of a found element or NULL - not store
> + @return
> + @retval FALSE if the pattern is found,
> + @retval TRUE otherwise
> + */
> +
> +my_bool dynamic_array_ulong_bin_search(DYNAMIC_ARRAY *array, const ulong *search,
> + uint * index)
> +{
> + DBUG_ENTER("dynamic_array_bin_search");
> + long l= 0, i, r= array->elements - 1;
> + while (l <= r)
> + {
> + ulong found;
> + DBUG_ASSERT((r < 0 || (ulong) r < array->elements) && l
> >=0);
> + i= (r + l)/2;
> + if ((found= (* (ulong*) dynamic_array_ptr(array, i))) == *search)
> + {
> + if (index)
> + *index= i;
> + DBUG_RETURN(FALSE);
> + }
> + if (found < *search)
> + l= i + 1;
> + else
> + r= i - 1;
> + }
> + DBUG_RETURN(TRUE);
> +}
>
If you are going to extend array.c with a function like this, make a
generic version. If you want a specialized one, keep it in the file
where you need it. Don't place a ulong-specific version in the mysys
library.
> diff -Nrup a/sql/lex.h b/sql/lex.h
> --- a/sql/lex.h 2008-04-01 13:56:38 +03:00
> +++ b/sql/lex.h 2008-05-08 21:00:32 +03:00
> @@ -321,6 +321,7 @@ static SYMBOL symbols[] = {
> { "MASTER_SSL_VERIFY_SERVER_CERT", SYM(MASTER_SSL_VERIFY_SERVER_CERT_SYM)},
> { "MASTER_USER", SYM(MASTER_USER_SYM)},
> { "MASTER_HEARTBEAT_PERIOD", SYM(MASTER_HEARTBEAT_PERIOD_SYM)},
> + { "MASTER_IGNORE_SERVER_ID", SYM(MASTER_IGNORE_SERVER_ID_SYM)},
>
What is "MASTER" referring to here? For "MASTER_USER" it is clearly the
user on the master, but what does the "MASTER" denote for
"IGNORE_SERVER_ID"?
> { "MATCH", SYM(MATCH)},
> { "MAX_CONNECTIONS_PER_HOUR", SYM(MAX_CONNECTIONS_PER_HOUR)},
> { "MAX_QUERIES_PER_HOUR", SYM(MAX_QUERIES_PER_HOUR)},
> diff -Nrup a/sql/rpl_mi.cc b/sql/rpl_mi.cc
> --- a/sql/rpl_mi.cc 2008-03-15 00:21:24 +02:00
> +++ b/sql/rpl_mi.cc 2008-05-08 21:00:32 +03:00
> @@ -28,17 +28,19 @@ int init_intvar_from_file(int* var, IO_C
> int init_strvar_from_file(char *var, int max_size, IO_CACHE *f,
> const char *default_val);
> int init_floatvar_from_file(float* var, IO_CACHE* f, float default_val);
> +int init_dynarray_intvar_from_file(DYNAMIC_ARRAY* arr, IO_CACHE* f);
>
> Master_info::Master_info()
> :Slave_reporting_capability("I/O"),
> ssl(0), ssl_verify_server_cert(0), fd(-1), io_thd(0), port(MYSQL_PORT),
> connect_retry(DEFAULT_CONNECT_RETRY), heartbeat_period(0),
> - received_heartbeats(0), inited(0),
> + received_heartbeats(0), inited(0), master_id(0),
> abort_slave(0), slave_running(0), slave_run_id(0)
> {
> host[0] = 0; user[0] = 0; password[0] = 0;
> ssl_ca[0]= 0; ssl_capath[0]= 0; ssl_cert[0]= 0;
> ssl_cipher[0]= 0; ssl_key[0]= 0;
> + my_init_dynamic_array(&ignore_server_ids, sizeof(::server_id), 16, 16);
>
> bzero((char*) &file, sizeof(file));
> pthread_mutex_init(&run_lock, MY_MUTEX_INIT_FAST);
> @@ -50,6 +52,7 @@ Master_info::Master_info()
>
> Master_info::~Master_info()
> {
> + delete_dynamic(&ignore_server_ids);
> pthread_mutex_destroy(&run_lock);
> pthread_mutex_destroy(&data_lock);
> pthread_cond_destroy(&data_cond);
> @@ -57,6 +60,27 @@ Master_info::~Master_info()
> pthread_cond_destroy(&stop_cond);
> }
>
> +/**
> + Reports if the s_id server has been configured to ignore events
> + it generates with
> +
> + CHANGE MASTER MASTER_IGNORE_SERVER_ID= s_id
> +
> + Method is called from the io thread event receiver filtering.
> +
> + @param s_id the master server identifier
> + @return
> + @retval TRUE if s_id is in the list of ignored master servers,
> + @retval FALSE otherwise.
> + */
> +bool Master_info::repl_server_id_ignore_p(ulong s_id)
> +{
> + if (likely(ignore_server_ids.elements == 1))
> + return (* (ulong*) dynamic_array_ptr(&ignore_server_ids, 0)) == s_id;
> + else
> + return !dynamic_array_ulong_bin_search(&ignore_server_ids,
> + (const ulong *) &s_id, NULL);
> +}
>
> void init_master_log_pos(Master_info* mi)
> {
> @@ -87,8 +111,11 @@ enum {
> /* 6.0 added value of master_heartbeat_period */
> LINE_FOR_MASTER_HEARTBEAT_PERIOD= 16,
>
> + /* 6.0 added value of master_ignore_server_id */
> + LINE_FOR_MASTER_IGNORE_SERVER_ID= 17,
> +
> /* Number of lines currently used when saving master info file */
> - LINES_IN_MASTER_INFO= LINE_FOR_MASTER_HEARTBEAT_PERIOD
> + LINES_IN_MASTER_INFO= LINE_FOR_MASTER_IGNORE_SERVER_ID
> };
>
>
> @@ -280,6 +307,17 @@ file '%s')", fname);
> if (lines >= LINE_FOR_MASTER_HEARTBEAT_PERIOD &&
> init_floatvar_from_file(&master_heartbeat_period, &mi->file,
> 0.0))
> goto errwithmsg;
> + /*
> + Starting from 6.0 list of server_id of ignorable servers might be
> + in the file
> + */
> + if (lines >= LINE_FOR_MASTER_IGNORE_SERVER_ID &&
> + init_dynarray_intvar_from_file(&mi->ignore_server_ids,
> &mi->file))
> + {
> + sql_print_error("Failed to initialize master info ignore_server_ids");
> + goto errwithmsg;
> + }
> +
> }
>
> #ifndef HAVE_OPENSSL
> @@ -379,16 +417,40 @@ int flush_master_info(Master_info* mi, b
> */
> char heartbeat_buf[sizeof(mi->heartbeat_period) * 4]; // buffer to suffice
> always
> my_sprintf(heartbeat_buf, (heartbeat_buf, "%.3f", mi->heartbeat_period));
> + /*
> + produce a line listing the total number and all the ignored server_id:s
> + */
> + char* ignore_server_ids_buf;
> + {
> + ignore_server_ids_buf=
> + (char *) my_malloc((sizeof(::server_id) * 3 + 1) *
> + (1 + mi->ignore_server_ids.elements), MYF(MY_WME));
> + if (!ignore_server_ids_buf)
> + DBUG_RETURN(1);
> + my_sprintf(ignore_server_ids_buf,
> + (ignore_server_ids_buf, "%u", mi->ignore_server_ids.elements));
> + for (ulong i= 0; i < mi->ignore_server_ids.elements; i++)
> + {
> + ulong s_id;
> + ulong cur_len= strlen(ignore_server_ids_buf);
>
You are calling the strlen() with each iteration when you can just keep
track of it using the buffer pointer and maintain the a pointer to the
current position by using the results of the my_sprintf(). Although this
is not a critical loop, this is not something we should do.
> + get_dynamic(&mi->ignore_server_ids, (uchar*) &s_id, i);
> + my_sprintf(ignore_server_ids_buf + cur_len,
> + (ignore_server_ids_buf + cur_len,
> + " %lu", s_id));
> + }
> + }
> my_b_seek(file, 0L);
> my_b_printf(file,
> - "%u\n%s\n%s\n%s\n%s\n%s\n%d\n%d\n%d\n%s\n%s\n%s\n%s\n%s\n%d\n%s\n",
> +
> "%u\n%s\n%s\n%s\n%s\n%s\n%d\n%d\n%d\n%s\n%s\n%s\n%s\n%s\n%d\n%s\n%s\n",
> LINES_IN_MASTER_INFO,
> mi->master_log_name, llstr(mi->master_log_pos, lbuf),
> mi->host, mi->user,
> mi->password, mi->port, mi->connect_retry,
> (int)(mi->ssl), mi->ssl_ca, mi->ssl_capath, mi->ssl_cert,
> mi->ssl_cipher, mi->ssl_key, mi->ssl_verify_server_cert,
> - heartbeat_buf);
> + heartbeat_buf,
> + ignore_server_ids_buf);
> + my_free(ignore_server_ids_buf, MYF(0));
> DBUG_RETURN(-flush_io_cache(file));
> }
>
> @@ -399,6 +461,7 @@ void end_master_info(Master_info* mi)
>
> if (!mi->inited)
> DBUG_VOID_RETURN;
> + reset_dynamic(&mi->ignore_server_ids);
> end_relay_log_info(&mi->rli);
> if (mi->fd >= 0)
> {
> diff -Nrup a/sql/rpl_mi.h b/sql/rpl_mi.h
> --- a/sql/rpl_mi.h 2008-02-03 11:00:47 +02:00
> +++ b/sql/rpl_mi.h 2008-05-08 21:00:32 +03:00
> @@ -20,6 +20,7 @@
>
> #include "rpl_rli.h"
> #include "rpl_reporting.h"
> +#include "my_sys.h"
>
>
> /*****************************************************************************
> @@ -60,6 +61,7 @@ class Master_info : public Slave_reporti
> public:
> Master_info();
> ~Master_info();
> + bool repl_server_id_ignore_p(ulong s_id);
>
> /* the variables below are needed because we can change masters on the fly */
> char master_log_name[FN_REFLEN];
> @@ -85,10 +87,12 @@ class Master_info : public Slave_reporti
> uint connect_retry;
> float heartbeat_period; // interface with CHANGE MASTER or master.info
> ulonglong received_heartbeats; // counter of received heartbeat events
> + DYNAMIC_ARRAY ignore_server_ids;
> #ifndef DBUG_OFF
> int events_till_disconnect;
> #endif
> bool inited;
> + volatile ulong master_id;
>
Why volatile?
> volatile bool abort_slave;
> volatile uint slave_running;
> volatile ulong slave_run_id;
> diff -Nrup a/sql/share/errmsg.txt b/sql/share/errmsg.txt
> --- a/sql/share/errmsg.txt 2008-04-14 19:58:40 +03:00
> +++ b/sql/share/errmsg.txt 2008-05-08 21:00:33 +03:00
> @@ -6252,3 +6252,5 @@ ER_SLAVE_HEARTBEAT_FAILURE
> eng "Unexpected master's heartbeat data: %s"
> ER_SLAVE_HEARTBEAT_VALUE_OUT_OF_RANGE
> eng "The requested value for the heartbeat period %s %s"
> +ER_SLAVE_IGNORE_SERVER_ID
> + eng "The requested server id %d clashes with the slave startup option
> --replicate-same-server-id"
> diff -Nrup a/sql/slave.cc b/sql/slave.cc
> --- a/sql/slave.cc 2008-04-01 13:56:42 +03:00
> +++ b/sql/slave.cc 2008-05-08 21:00:32 +03:00
> @@ -759,6 +759,95 @@ int init_floatvar_from_file(float* var,
> DBUG_RETURN(1);
> }
>
> +/**
> + A master info read method
> +
> + This function is called from @c init_master_info() @c along with
> + relatives to restore some of @c active_mi @c members.
> + Particularly, this function is responsible for restoring
> + MASTER_IGNORE_SERVER_ID list of servers whose events the slave is
> + going to ignore (to not log them in the relay log).
> + Items being read are supposed to be decimal output of values of a
> + type shorter or equal of @c long @c and separated by the single space.
> +
> + @param arr @c DYNAMIC_ARRAY @c pointer to storage for servers id
> + @param f @c IO_CACHE @c pointer to the source file
> +
> + @return
> + @retval 0 All OK
> + @retval non-zero An error
> +*/
> +
> +int init_dynarray_intvar_from_file(DYNAMIC_ARRAY* arr, IO_CACHE* f)
> +{
> + int ret= 0;
> + char buf[16 * (sizeof(long)*4 + 1)]; // static buffer to use most of times
> + char *buf_act= buf; // actual buffer can be dynamic if static is short
> + char *token;
> + uint num_items; // number of items of `arr'
> + size_t read_size;
> + DBUG_ENTER("init_dynarray_intvar_from_file");
> +
> + if ((read_size= my_b_gets(f, buf_act, sizeof(buf))) == 0)
> + {
> + return 0; // no line in master.info
> + }
> + if (read_size + 1 == sizeof(buf) && buf[sizeof(buf) - 2] != '\n')
> + {
> + /*
> + short read happpend; allocate sufficient memory and make the 2nd read
> + */
> + char buf_work[(sizeof(long)*3 + 1)*16];
> + memcpy(buf_work, buf, sizeof(buf_work));
> + num_items= atoi(strtok(buf_work, " "));
> + size_t snd_size;
> + /*
> + max size lower bound approximate estimation bases on the formula:
> + (the items number + items themselves) *
> + (decimal size + space) - 1 + `\n' + '\0'
> + */
> + size_t max_size= (1 + num_items) * (sizeof(long)*3 + 1) + 1;
> + buf_act= (char*) my_malloc(max_size, MYF(MY_WME));
> + memcpy(buf_act, buf, read_size);
> + snd_size= my_b_gets(f, buf_act + read_size, max_size - read_size);
> + if (snd_size == 0 ||
> + (snd_size + 1 == max_size - read_size) && buf[max_size - 2] !=
> '\n')
> + {
> + /*
> + failure to make the 2nd read or short read again
> + */
> + ret= 1;
> + goto err;
> + }
> + }
> + token= strtok(buf_act, " ");
>
strtok() is not thread-safe. Please stay away from it.
> + if (token == NULL)
> + {
> + ret= 1;
> + goto err;
> + }
> + num_items= atoi(token);
> + for (uint i=0; i < num_items; i++)
> + {
> + token= strtok(NULL, " ");
> + if (token == NULL)
> + {
> + ret= 1;
> + goto err;
> + }
> + else
> + {
> + ulong val= atol(token);
> + insert_dynamic(arr, (uchar *) &val);
> + }
> + }
> +err:
> + if (buf_act != buf)
> + my_free(buf_act, MYF(0));
> + DBUG_RETURN(ret);
> +}
> +
> +
> static bool check_io_slave_killed(THD *thd, Master_info *mi, const char *info)
> {
> if (io_slave_killed(thd, mi))
> @@ -907,7 +996,7 @@ static int get_master_version_and_clock(
> (master_res= mysql_store_result(mysql)))
> {
> if ((master_row= mysql_fetch_row(master_res)) &&
> - (::server_id == strtoul(master_row[1], 0, 10)) &&
> + (::server_id == (mi->master_id= strtoul(master_row[1], 0, 10)))
> &&
> !mi->rli.replicate_same_server_id)
> {
> errmsg=
> @@ -923,7 +1012,14 @@ static int get_master_version_and_clock(
> if (errmsg)
> goto err;
> }
> -
> + if (mi->master_id == 0 && mi->ignore_server_ids.elements > 0)
> + {
> + errmsg="Slave configured with server id filtering could not detect the master
> server id.";
> + err_code= ER_SLAVE_FATAL_ERROR;
> + sprintf(err_buff, ER(err_code), errmsg);
> + err_msg.append(err_buff);
> + goto err;
> + }
> /*
> Check that the master's global character_set_server and ours are the same.
> Not fatal if query fails (old master?).
> @@ -1238,6 +1334,10 @@ bool show_master_info(THD* thd, Master_i
> field_list.push_back(new Item_empty_string("Last_IO_Error", 20));
> field_list.push_back(new Item_return_int("Last_SQL_Errno", 4, MYSQL_TYPE_LONG));
> field_list.push_back(new Item_empty_string("Last_SQL_Error", 20));
> + field_list.push_back(new Item_empty_string("Replicate_Ignore_Server_id",
> + FN_REFLEN));
>
Please be consistent: "Replication_Ignore_Server_Id" Also, I question
the need for "Replication" and think that "Ignore_Server_Ids" or
"Ignore_Server_Id_List" is more appropriate. Same for the variable given
to CHANGE MASTER. In addition, they should have the same name (not
counting case) to avoid confusion.
> + field_list.push_back(new Item_return_int("Master_Server_Id", sizeof(ulong),
> + MYSQL_TYPE_LONG));
>
> if (protocol->send_fields(&field_list,
> Protocol::SEND_NUM_ROWS | Protocol::SEND_EOF))
> @@ -1357,6 +1457,32 @@ bool show_master_info(THD* thd, Master_i
> protocol->store(mi->rli.last_error().number);
> // Last_SQL_Error
> protocol->store(mi->rli.last_error().message, &my_charset_bin);
> + // Replicate_Ignore_Server_id
> + {
> + char buff[FN_REFLEN];
> + ulong i, cur_len;
> + for (i= 0, buff[0]= 0,cur_len= 0;
> + i < mi->ignore_server_ids.elements; i++, cur_len=strlen(buff))
> + {
> + ulong s_id;
> + char sbuff[FN_REFLEN];
> + get_dynamic(&mi->ignore_server_ids, (uchar*) &s_id, i);
> + my_sprintf(sbuff, (sbuff, (i==0? "%lu" : ", %lu"), s_id));
> + if (cur_len + strlen(sbuff) + 4 > FN_REFLEN)
>
You're computing strlen() inside a loop again when you can just use keep
the length available using the result of my_sprintf(). It's quadratic
when it should be linear.
> + {
> + /*
> + break the loop whenever remained space could not fit
> + ellipses on the next cycle
> + */
> + my_sprintf(buff + cur_len, (buff + cur_len, "..."));
> + break;
> + }
> + my_sprintf(buff + cur_len, (buff + cur_len, "%s", sbuff));
> + }
> + protocol->store(buff, &my_charset_bin);
> + }
> + // Master_Server_id
> + protocol->store((uint32) mi->master_id);
>
> pthread_mutex_unlock(&mi->rli.data_lock);
> pthread_mutex_unlock(&mi->data_lock);
> @@ -3069,6 +3195,7 @@ static int queue_event(Master_info* mi,c
> ulong inc_pos;
> Relay_log_info *rli= &mi->rli;
> pthread_mutex_t *log_lock= rli->relay_log.get_log_lock();
> + ulong s_id;
> DBUG_ENTER("queue_event");
>
> LINT_INIT(inc_pos);
> @@ -3215,9 +3342,14 @@ static int queue_event(Master_info* mi,c
> */
>
> pthread_mutex_lock(log_lock);
> -
> - if ((uint4korr(buf + SERVER_ID_OFFSET) == ::server_id) &&
> - !mi->rli.replicate_same_server_id)
> + s_id= uint4korr(buf + SERVER_ID_OFFSET);
> + if ((s_id == ::server_id && !mi->rli.replicate_same_server_id) ||
> + (mi->repl_server_id_ignore_p(s_id) &&
> + /* everything is filtered out from non-master */
> + (s_id != mi->master_id ||
> + /* for the master meta information is necessary */
> + buf[EVENT_TYPE_OFFSET] != FORMAT_DESCRIPTION_EVENT &&
> + buf[EVENT_TYPE_OFFSET] != ROTATE_EVENT)))
> {
> /*
> Do not write it to the relay log.
> @@ -3232,8 +3364,12 @@ static int queue_event(Master_info* mi,c
> But events which were generated by this slave and which do not exist in
> the master's binlog (i.e. Format_desc, Rotate & Stop) should not
> increment
> mi->master_log_pos.
> + If the event is originated remotely and is being filtered out by
> + MASTER_IGNORE_SERVER_ID it increments mi->master_log_pos
> + as well as rli->group_relay_log_pos.
> */
> - if (buf[EVENT_TYPE_OFFSET]!=FORMAT_DESCRIPTION_EVENT &&
> + if (!(s_id == ::server_id && !mi->rli.replicate_same_server_id) ||
> + buf[EVENT_TYPE_OFFSET]!=FORMAT_DESCRIPTION_EVENT &&
> buf[EVENT_TYPE_OFFSET]!=ROTATE_EVENT &&
> buf[EVENT_TYPE_OFFSET]!=STOP_EVENT)
>
It's not something you did, but could you please add spaces around the != ?
> {
> @@ -3243,8 +3379,8 @@ static int queue_event(Master_info* mi,c
> rli->ign_master_log_pos_end= mi->master_log_pos;
> }
> rli->relay_log.signal_update(); // the slave SQL thread needs to re-check
> - DBUG_PRINT("info", ("master_log_pos: %lu, event originating from the same
> server, ignored",
> - (ulong) mi->master_log_pos));
> + DBUG_PRINT("info", ("master_log_pos: %lu, event originating from %u server,
> ignored",
> + (ulong) mi->master_log_pos, uint4korr(buf +
> SERVER_ID_OFFSET)));
> }
> else
> {
> diff -Nrup a/sql/sql_lex.h b/sql/sql_lex.h
> --- a/sql/sql_lex.h 2008-04-01 13:56:44 +03:00
> +++ b/sql/sql_lex.h 2008-05-08 21:00:32 +03:00
> @@ -211,6 +211,7 @@ typedef struct st_lex_master_info
> char *ssl_key, *ssl_cert, *ssl_ca, *ssl_capath, *ssl_cipher;
> char *relay_log_name;
> ulong relay_log_pos;
> + DYNAMIC_ARRAY repl_ignore_server_ids;
> } LEX_MASTER_INFO;
>
>
> diff -Nrup a/sql/sql_repl.cc b/sql/sql_repl.cc
> --- a/sql/sql_repl.cc 2008-03-08 12:14:45 +02:00
> +++ b/sql/sql_repl.cc 2008-05-08 21:00:32 +03:00
> @@ -1189,32 +1189,41 @@ void kill_zombie_dump_threads(uint32 sla
> }
> }
>
> +/**
> + A comparison function to be supplied as argument to @c sort_dynamic() @c
> + provide ascendant sort of a ulong-items dynamic array
> +*/
> +static int change_master_server_id_cmp(ulong *id1, ulong *id2)
> +{
> + return (*id1) > (*id2);
>
Parentheses are not needed here.
> +}
>
> bool change_master(THD* thd, Master_info* mi)
> {
> int thread_mask;
> const char* errmsg= 0;
> bool need_relay_log_purge= 1;
> + bool ret= FALSE;
> DBUG_ENTER("change_master");
>
> lock_slave_threads(mi);
> init_thread_mask(&thread_mask,mi,0 /*not inverse*/);
> + LEX_MASTER_INFO* lex_mi= &thd->lex->mi;
> if (thread_mask) // We refuse if any slave thread is running
> {
> my_message(ER_SLAVE_MUST_STOP, ER(ER_SLAVE_MUST_STOP), MYF(0));
> - unlock_slave_threads(mi);
> - DBUG_RETURN(TRUE);
> + ret= TRUE;
> + goto err;
> }
>
> thd_proc_info(thd, "Changing master");
> - LEX_MASTER_INFO* lex_mi= &thd->lex->mi;
> // TODO: see if needs re-write
> if (init_master_info(mi, master_info_file, relay_log_info_file, 0,
> thread_mask))
> {
> my_message(ER_MASTER_INFO, ER(ER_MASTER_INFO), MYF(0));
> - unlock_slave_threads(mi);
> - DBUG_RETURN(TRUE);
> + ret= TRUE;
> + goto err;
> }
>
> /*
> @@ -1259,6 +1268,27 @@ bool change_master(THD* thd, Master_info
> mi->heartbeat_period= (float) min(SLAVE_MAX_HEARTBEAT_PERIOD,
> (slave_net_timeout/2.0));
> mi->received_heartbeats= LL(0); // counter lives until master is CHANGEd
> + /*
> + reset the last time server_id list if the current CHANGE MASTER
> + is mentioning MASTER_IGNORE_SERVER_ID
> + */
> + if (lex_mi->repl_ignore_server_ids.elements > 0)
> + reset_dynamic(&mi->ignore_server_ids);
> + for (uint i= 0; i < lex_mi->repl_ignore_server_ids.elements; i++)
> + {
> + ulong s_id;
> + get_dynamic(&lex_mi->repl_ignore_server_ids, (uchar*) &s_id, i);
> + if (s_id == ::server_id && replicate_same_server_id)
> + {
> + my_error(ER_SLAVE_IGNORE_SERVER_ID, MYF(0), s_id);
> + ret= TRUE;
> + goto err;
> + }
> + else
> + insert_dynamic(&mi->ignore_server_ids, (uchar*) &s_id);
> + }
> + sort_dynamic(&mi->ignore_server_ids, (qsort_cmp)
> change_master_server_id_cmp);
>
I see no reason to reset this list when doing a CHANGE MASTER. Is there
a good reason for it?
> +
> if (lex_mi->ssl != LEX_MASTER_INFO::LEX_MI_UNCHANGED)
> mi->ssl= (lex_mi->ssl == LEX_MASTER_INFO::LEX_MI_ENABLE);
>
> @@ -1336,8 +1366,8 @@ bool change_master(THD* thd, Master_info
> if (flush_master_info(mi, 0))
> {
> my_error(ER_RELAY_LOG_INIT, MYF(0), "Failed to flush master info file");
> - unlock_slave_threads(mi);
> - DBUG_RETURN(TRUE);
> + ret= TRUE;
> + goto err;
> }
> if (need_relay_log_purge)
> {
> @@ -1348,8 +1378,8 @@ bool change_master(THD* thd, Master_info
> &errmsg))
> {
> my_error(ER_RELAY_LOG_FAIL, MYF(0), errmsg);
> - unlock_slave_threads(mi);
> - DBUG_RETURN(TRUE);
> + ret= TRUE;
> + goto err;
> }
> }
> else
> @@ -1364,8 +1394,8 @@ bool change_master(THD* thd, Master_info
> &msg, 0))
> {
> my_error(ER_RELAY_LOG_INIT, MYF(0), msg);
> - unlock_slave_threads(mi);
> - DBUG_RETURN(TRUE);
> + ret= TRUE;
> + goto err;
> }
> }
> /*
> @@ -1402,10 +1432,13 @@ bool change_master(THD* thd, Master_info
> pthread_cond_broadcast(&mi->data_cond);
> pthread_mutex_unlock(&mi->rli.data_lock);
>
> +err:
> unlock_slave_threads(mi);
> thd_proc_info(thd, 0);
> - my_ok(thd);
> - DBUG_RETURN(FALSE);
> + if (ret == FALSE)
> + my_ok(thd);
> + delete_dynamic(&lex_mi->repl_ignore_server_ids); //freeing of parser-time
> alloc
> + DBUG_RETURN(ret);
> }
>
> int reset_master(THD* thd)
> diff -Nrup a/sql/sql_yacc.yy b/sql/sql_yacc.yy
> --- a/sql/sql_yacc.yy 2008-04-01 16:19:00 +03:00
> +++ b/sql/sql_yacc.yy 2008-05-08 21:00:32 +03:00
> @@ -892,6 +892,7 @@ bool my_yyoverflow(short **a, YYSTYPE **
> %token MASTER_SYM
> %token MASTER_USER_SYM
> %token MASTER_HEARTBEAT_PERIOD_SYM
> +%token MASTER_IGNORE_SERVER_ID_SYM
> %token MATCH /* SQL-2003-R */
> %token MAX_CONNECTIONS_PER_HOUR
> %token MAX_QUERIES_PER_HOUR
> @@ -1636,6 +1637,9 @@ change:
> LEX *lex = Lex;
> lex->sql_command = SQLCOM_CHANGE_MASTER;
> bzero((char*) &lex->mi, sizeof(lex->mi));
> + lex->mi.heartbeat_opt= LEX_MASTER_INFO::LEX_MI_UNCHANGED;
> + my_init_dynamic_array(&Lex->mi.repl_ignore_server_ids,
> + sizeof(::server_id), 16, 16);
> }
> master_defs
> {}
> @@ -1735,6 +1739,10 @@ master_def:
> Lex->mi.heartbeat_opt= LEX_MASTER_INFO::LEX_MI_DISABLE;
> }
> Lex->mi.heartbeat_opt= LEX_MASTER_INFO::LEX_MI_ENABLE;
> + }
> + | MASTER_IGNORE_SERVER_ID_SYM EQ ulong_num
> + {
> + insert_dynamic(&Lex->mi.repl_ignore_server_ids, (uchar*)
> &($3));
> }
> |
> master_file_def
>
>
--
Mats Kindahl
Lead Software Developer
Replication Team
MySQL AB, www.mysql.com