List:Commits« Previous MessageNext Message »
From:Andrei Elkin Date:May 30 2008 4:07pm
Subject:Re: bk commit into 6.0 tree (aelkin:1.2634) BUG#25998
View as plain text  
Mats, hello.

> Andrei Elkin wrote:
>> Mats, hej.
>> 
>>> Hi Andrei!
>>>
>>> Comments inline below.
>>>
>> 
>> Thanks for comments! I tagged with (todo) some of points of discussion.
>> A newer patch will come later today.
>> 
>>> Just my few cents,
>>> Mats Kindahl
>>>
>>> Andrei Elkin wrote:
>>>> Below is the list of changes that have just been committed into a local
>>>> 6.0 repository of aelkin.  When aelkin does a push these changes
>>>> will be propagated to the main repository and, within 24 hours after the
>>>> push, to the public repository.
>>>> For information on how to access the public repository
>>>> see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html
>>>>
>>>> ChangeSet@stripped, 2008-05-08 21:00:35+03:00, aelkin@stripped
> +13 -0
>>>>   Bug #25998 problems about circle replication
>>>>   Bug #27808 Infinite looping in circular replication
>>>>     In case of withdrawing one of the servers from the circular
>>>> multi-master replication group
>>>>   events generated by the removed server could become unstoppable
> (bug#25998).
>>>>   That's because the originator had been the terminator of the own event
> flow.
>>>>   
>>> "circular multi-master replication group" ? 


>>> We don't have multi-master. 

>> 
>> Perhaps you confused with multi-sourced that we don't have indeed.
>> It's okay with multi-master. 

>> http://en.wikipedia.org/wiki/Multi-master_replication

>> 
>>> If you mean something else, then please re-write this
>>> paragraph to clarify the situation.
>>>
>> 

>> confirms we have multi-master, and I am proud of that -:)

>
> Sorry?

I invited you to read the page. It says MySQL has multi-master.
And I expected you would share the proud :-)

>
> [snip]
>
>
>> 
>>> "Artifact" is not a good word in these situations, you should use
>>> "problem" instead.
>> 
>> I'd rather stay off a linguistic discussion.
>> No problem with me if I need to change this place indeed.
>> Still, http://en.wikipedia.org/wiki/Artifact_%28error%29.
>
> That's up to you.
>

Thanks. I hope, it's done better in a newer patch.


>>>>     MASTER_IGNORE_SERVER_ID= some_server_id
>>>>     can be repeated multiple times to list all the servers from
>>>> which events would not be put into
>>>>   relay log and therefore would not be executed.
>>>>   
>>> Why not have a comma-separated list of server ids?
>>>
>> 

Well, never mind on the paragraph below.
I think you are right.
If an identifier the parser assigns to is a list, let's make the rhs
as the list.

>> There are many reasons actually.
>> 
>> Practically it's going to be a single identifier at most;
>> the comma is bound to separate the top level sub-clauses such as
>> master_port=, master_host=, etc, so that engaging the comma with
>> more job would require re-writing the current parsing
>> rules, particularly for `master_defs';
>> imo it would not be a perfect solution anyway without introducing
>> a new token(s) for grouping, e.g parenthesis "(", ")"
>> 
>>   something = (item_1, item_2, \dots, item_n)
>> 
>> \footnote{but I would prefer then the white space as separator in the list
>>   something = (item_1 item_2 \dots item_n)
>> }
>
> The syntax for writing sets in SQL is '1,2,3,4'. You can use that, i.e.,
> a string with comma-separated integers inside. That means you do not
> have to extend the syntax with anything complex, just accept a string
> and then you parse it internally.

sure, the comma.

(that my lisp love was saying ;)


>
> [snip]
>
>>>>    d. resetting the ignored server ids with RESET SLAVE;
>>>>   
>>> RESET SLAVE does not change MASTER_HOST, nor any of the other
>>> parameters, so it does not make sense that it should reset the list of
>>> ignored server ids.
>> 
>> A good note. Indeed, it does change:
>> 
>>    13.6.2.5 `RESET SLAVE' Syntax
>>    .............................
>> 
>>    `RESET SLAVE' makes the slave forget its replication position in the
>>    master's binary logs. This statement is meant to be used for a clean
>>    start: It deletes the `master.info' and `relay-log.info' files, all the
>>    relay logs, and starts a new relay log.
>> 
>> If after `reset slave' the slave server restarts, it won't find
>> nothing about the last time configuration.
>> 
>> The verb "reset" and more the quoted paragraph does not make sense
>> for not changing MASTER_HOST etc, which we are used to observe when
>> `reset slave' can be effectively continued with `start slave'.
>> I find the current behavior not compliant to the docs.
>> The two must be separated with `CHANGE MASTER' imo.
>> 
>> So your note actually manifests a new bug that we have to fix.
>
> Maybe so. I agree that the binlog positions do reset, and that the
> manual seems to be wrong, but the host does not reset, nor the port, nor
> any of the other "fixed" parameters. I find this list of ignored
> identifier most similar to this list, and therefore I would like to have
> it stay and not be reset, precisely so that one can do a START SLAVE
> afterwards.

I'd rather to be sympathetic to your claim, as it's in my principles
not to do any extra (i.e nobody/nothing asks for or needs) work.
However, i could not help make resetting as the docs I think is
conceptually correct, implementation not.

One of the arguments that we can consider is if the slave server is
shut down after RESET SLAVE, it will have to re-new master info with 
CHANGE MASTER. Without shutdown, the slave server is maintaining the
master connecting parameters in its volatile memory.

I think we can leave the current undocumented behavior for
backward compatibility. Maybe to document it, but then any new feature
should be questioned about a need to be reset or left volatile.
Imo, ultimately, RESET MASTER needs always to follow with CHANGE MASTER.

I think we need to vote on this issue.

1. this bug issue voting agenda:

   Shall we enforce that a new option to CHANGE MASTER will survive
   reset master. I.e at the end of the following sequence of queries
   
   change master ignore_server_id=(1,2,3);
   reset master;
   start master;

   wrt to the current IGNORE_SERVER_ID option we shall have either:

   SHOW SLAVE STATUS like `ignore_server_id' => 1,2,3

   or another alternative

   SHOW SLAVE STATUS like `ignore_server_id' => empty


2. a general agenda for voting, (can be postponed till better time)

   Shall we continue with undocumented policy (convert it to
   documented) for the slave server to memorize in its volatile memory
   parameters responsible for successful connecting to the master like
   master_host, master_port, master_user, master_password (some more),
   so that RESET SLAVE can be followed immediately with START SLAVE
   without an intermediate CHANGE MASTER

   or

   we shall enforce the user after RESET SLAVE always follow
   with a new CHANGE MASTER so that 

   RESET SLAVE, shutdown, START SLAVE
   RESET SLAVE, START SLAVE

   will work uniformly trying to connect to nowhere, and the
   only effective way to resume replication after RESET SLAVE is 

   CHANGE MASTER; START SLAVE;



>
> [snip]
>
>>>>   mysys/array.c@stripped, 2008-05-08 21:00:32+03:00,
> aelkin@stripped +37 -0
>>>>     Implementation of the search function.
>>>>     The third in-out arg is provided for the caller if it needs the
> index
>>>>     of the found search pattern.
>>>>   
>>> "The third in-out arg"?! I cannot decode this without reading the
>>> code. Give the function name, what parameter you added, and the
>>> purpose of that parameter.
>> 
>> I'd rather to remove the details of implementation from the comments.
>> Really, one can read the synopsis in the sources.
>
> You have already provided implementation details by talking about the
> position of an argument to a function. I agree that it might be better
> to keep implementation details out of the comment, but then the comment
> needs rewriting anyway.
>
>> 
>> (todo)
>> 
>>>>   sql/lex.h@stripped, 2008-05-08 21:00:32+03:00,
> aelkin@stripped +1 -0
>>>>     new keyword
>>>>   
>>> What new keyword?
>> 
>> indeed, a new member of struct st_lex_master_info (todo)
>
> Just name it, please.

ok

>
>>>>   sql/rpl_mi.cc@stripped, 2008-05-08 21:00:32+03:00,
> aelkin@stripped +67 -4
>>>>     initialization, storing into master info, reading from the file,
> deallocation
>>>>     of the list of ignored server id:s.
>>>>   
>>> "initialization"? What did you do here?
>>> "storing into master info"? Storing what into master info? What did
>>> you do here?
>>> "reading from the file"? What file? What did you do here?
>>>
>> 
>> I am lost in the number of questions...
>> 
>> I seriously doubt you had a hard time to find out what is supposed to be
>> stored into, read from the master info file :-)
>> Besides, it's said explicitly at the end of the clause - "of the list
>> of ignored server id:s". The latter is the common object of several
>> operations mentioned in the comment clause.
>> 
>> One point of your critique I have to accept:
>> "initialization" meant allocation actually (of what is said at the end of the
>> clause). (todo)
>> 
>> Yes, there is yet another initialization. Of the new `master_id'
>> member of Master_info.
>> That's not said. Neither it is worth to mention as there is the source
>> of the patch.
>> 
>> "Reading from the file" means reading from a file that was last time
>> mentioned.
>> 
>> "What did you do here?" obviously, the slave reads from the master
>> info file, the new line with the ignored server id list.
>

> The point is that the comment is a fragment and does not make any sense
> as it stands. I would prefer if you could write out the comment so that
> it makes sense without having to look at the code.

It's too strong requirement to provide comments eliminating need to
look as the source code. Comments are by the definition a part of it.
In my opinion, none, incl a reviewer, can demand such service as
release from reading the sources.

Contradictory comments cases or a case of partly qualified references,
like mentioning a member but not the class (that happened to me above)
- this is something I am eager to consider.


>
> Normally one browses the file comments when trying to find a specific
> change, like if there were any changes to how a variable was operated
> on, functions that changed behavior, etc. Reading 100+ file comments all
> saying that something was initialized does not really help in this
> situation, and then the comment is pointless and is not needed.
>

I would share your objection if the qualification of an object
subjected to changes does not suffice to identify it or, worse, misleads
a reviewer.
However, it was not "something" but has been explicitly.


> [snip]
>
>>>>   sql/sql_lex.h@stripped, 2008-05-08 21:00:32+03:00,
> aelkin@stripped +1 -0
>>>>     parser time storage for variable size array of server ids
>>>>
>>>>   sql/sql_repl.cc@stripped, 2008-05-08 21:00:32+03:00,
> aelkin@stripped +46 -13
>>>>     aux function for sorting the ignored server_id:s;
>>>>     sorting of the ignored servers implementation;
>>>>     failing change master do not forget to free lex's dynamical storage
> with parsed
>>>>     ignored server_ids;
>>>>     resetting the old list of ignored server if the parser detected the
> new list;
>>>>     shifting some pieces of code in order to comply with the above
> logics;
>>>>
>>>>   sql/sql_yacc.yy@stripped, 2008-05-08 21:00:32+03:00,
> aelkin@stripped +8 -0
>>>>     allocation a dyn array for possible list of ignored server ids;
>>>>     filling the array;
>>>>     resetting (missed part of wl#342) heartbeat_opt at the beginning of
>>>>     CHANGE MASTER processing;
>>>>   
>>> I don't follow why you need to reset the heartbeat option when doing
>>> CHANGE MASTER and why you need to do this change in this patch.
>> 
>> We have a convention that absence of MASTER_HEARTBEAT_PERIOD in the args
>> means heartbeat cancellation. The easiest way to enforce the rule it to
>> reset a possible last time non-zero value at the beginning of CHANGE MASTER
>> parsing.
>> I might have reported a bug, but think piggybacking on the current
>> ticket would be okay.
>
> OK. Could you please write out that we have this assumption both in the
> code and in the cset so that nobody is confused.
>

ok

> [snip]
>
>>>> +
>>>> +source include/master-slave.inc;
>>>> +
>>>> +connection slave;
>>>> +source include/wait_for_slave_to_start.inc;
>>>>   
>>> Is this necessary? 
>> 
>> Sure.
>> As the test is going show `master_id' the new member of `show slave
>> status' which is the server id of the master.
>> 
>>> Shouldn't this line be added to master-slave.inc
>>> instead in that case?
>> 
>> An interesting note.
>> I'd rather it to be a part of `start slave', which is called from 
>> include/master-slave.inc as well.
>

> I would say that it is expected that after master-slave.inc has been
> executed, test is ready to run, including that the slave has started.
>
> I would prefer you could add that line to master-slave.inc instead.

Well, two things.

The main, i did not notice last time that it's not the test of SQL thread

  include/wait_for_slave_to_start.inc 

but synchronization with the master

  sync_slave_with_master;

which was meant, sorry.

Second is a footnote{it would not be instead. "Instead" would be to do that
in `start slave' handler because that suffice to
`include/master-slave.inc', not vice versa}



>
>> 
>>>> +# a new line for master_id
>>>> +let $master_id= query_get_value(SHOW SLAVE STATUS, Master_Server_Id,
> 1);
>>>> +echo master_id: $master_id;
>>>> +
>>>> +stop slave;
>>>>   
>>> wait for slave to stop.
>>>
>> 
>> I don't think we really need to wait for stop, actually.
>> In the following the test is concerned with results of CHANGE MASTER
>> and execution on the slave after its restarting.
>> Later in the test, there is sync_slave_with_master synchoronization
>> that supersedes wait for stop and wait for start both.
>
> True. Do as you like.
>
> [snip]
>
>> 
>>>> +start slave;
>>>>   
>> 
>>> Wait for slave to start. There is a slight chance that you execute
>>> this statement, switch to the master, and when doing
>>> sync_slave_with_master, the master_pos_wait() will return NULL because
>>> the slave has not had a chance to start.

Could you please test your claim scenario?
Reading the code does not confirm it.

Even the comments :-) in start_slave():


 slave_errno = start_slave_threads(0 /*no mutex */,
					1 /* wait for start */,
					mi,
					master_info_file,relay_log_info_file,
					thread_mask);

>>>
>> 
>> I understand your worries:
>> 
>> " The function returns `NULL' if the slave SQL thread is not started "
>> 
>> However, `slave start' returns OK to mysqltest only after SQL thread has been
>> started.
>

And my own test confirm that ^.


> Not true. Start slave just initiates the slave start, it returns before
> the thread has been properly started, so the chance I mention above is
> valid.

I think you need to test it yourself.

>
>> 
>> 
>>>> +
>>>> +connection master;
>>>> +
>>>> +#connection slave;
>>>> +sync_slave_with_master;
>>>> +let $slave_relay_pos0= query_get_value(SHOW SLAVE STATUS, Relay_Log_Pos,
> 1);
>>>> +
>>>> +connection master;
>>>> +create table t1 (n int);
>>>> +let $master_binlog_end= query_get_value(SHOW MASTER STATUS, Position,
> 1);
>>>> +
>>>> +connection slave;
>>>> +let $slave_param= Exec_Master_Log_Pos;
>>>> +let $slave_param_value= $master_binlog_end;
>>>> +source include/wait_for_slave_param.inc;
>>>> +--echo *** must be empty as the event is to be filtered out ***
>>>> +show tables;
>>>> +--echo *** allowing events from master ***
>>>> +let $slave_relay_pos1= query_get_value(SHOW SLAVE STATUS, Relay_Log_Pos,
> 1);
>>>> +#
>>>> +# checking stability of relay log pos
>>>> +#
>>>> +if (`select $slave_relay_pos1 - $slave_relay_pos0`)
>>>> +{
>>>> +    --echo Error: relay log position changed:  $slave_relay_pos0,
> $slave_relay_pos1
>>>> +    query_vertical show slave status;
>>>>   
>> 
>>> This will print a lot of info that will cause the test to fail in
>>> pushbuild.
>> 
>> Sorry, 
>> this will print a lot of info   if  the test will fail in pushbuild.
>> 
>>>  Use "include/show_slave_status.inc" or something equivalent
>>> to filter out the pointless columns.
>>>
>> 
>> I think you agree that's how it should be in the case of an error of
>> a test.
>
> Hmm... OK.
>
>> 
>>>> +}
>>>> +stop slave;
>>>> +source include/wait_for_slave_to_stop.inc;
>>>> +reset slave;
>>>> +--echo *** must be empty due to reset slave ***
>>>> +let $ignore_list= query_get_value(SHOW SLAVE STATUS,
> Replicate_Ignore_Server_id, 1);
>>>> +echo ignore server id list: $ignore_list;
>>>> +--replace_result $MASTER_MYPORT MASTER_PORT
>>>> +eval change master to master_host='127.0.0.1', master_user='root',
>>>> + master_password='', master_port=$MASTER_MYPORT;
>>>>   +start slave;
>>>>   
>>> Wait for slave to start.
>>>
>> 
>> sorry, no need as explained.
>
> See above.

let's vote.

>
> [snip]
>
>> 
>>>> diff -Nrup a/sql/lex.h b/sql/lex.h
>>>> --- a/sql/lex.h	2008-04-01 13:56:38 +03:00
>>>> +++ b/sql/lex.h	2008-05-08 21:00:32 +03:00
>>>> @@ -321,6 +321,7 @@ static SYMBOL symbols[] = {
>>>>    { "MASTER_SSL_VERIFY_SERVER_CERT",
> SYM(MASTER_SSL_VERIFY_SERVER_CERT_SYM)},
>>>>    { "MASTER_USER",           SYM(MASTER_USER_SYM)},
>>>>    { "MASTER_HEARTBEAT_PERIOD", SYM(MASTER_HEARTBEAT_PERIOD_SYM)},
>>>> +  { "MASTER_IGNORE_SERVER_ID", SYM(MASTER_IGNORE_SERVER_ID_SYM)},
>>>>   
>> 
>>> What is "MASTER" referring to here? For "MASTER_USER" it is clearly
>>> the user on the master, but what does the "MASTER" denote for
>>> "IGNORE_SERVER_ID"?
>> 
>> a good point. I seemed to had been hypnotized with that all the options
>> having MASTER_ prefix. It's not really so, there are two that do not
>> have. This drop off my early hesitations.
>> 
>> (todo)
>
> Does this mean you remove the MASTER_ for this option?
>

yes

> [snip]
>
>
>>>>      Check that the master's global character_set_server and ours are the
> same.
>>>>      Not fatal if query fails (old master?).
>>>> @@ -1238,6 +1334,10 @@ bool show_master_info(THD* thd, Master_i
>>>>    field_list.push_back(new Item_empty_string("Last_IO_Error", 20));
>>>>    field_list.push_back(new Item_return_int("Last_SQL_Errno", 4,
> MYSQL_TYPE_LONG));
>>>>    field_list.push_back(new Item_empty_string("Last_SQL_Error", 20));
>>>> +  field_list.push_back(new
> Item_empty_string("Replicate_Ignore_Server_id",
>>>> +                                             FN_REFLEN));
>>>>   
>> 
>>> Please be consistent: "Replication_Ignore_Server_Id" 
>> 
>> I am not sure what would your version be consistent with.
>> The idea of the feature is to filter. And show slave status displays
>> 
>> Replicate_Ignore_DB, Replicate_Ignore_Table, other Replicate_, never
>> Replication_.
>
> Sorry, I was referring to the capitalization of the "id", not the choice
> of "Replication", that was a typo.

ok, that was really hard to guess :-)

>
> Just my few cents,
> Mats Kindahl
>
> -- 
> Mats Kindahl
> Lead Software Developer
> Replication Team
> MySQL AB, www.mysql.com
>
>
> -- 
> MySQL Code Commits Mailing List
> For list archives: http://lists.mysql.com/commits
> To unsubscribe:    http://lists.mysql.com/commits?unsub=1

I have committed a new cset that covers all items what we have agreed
on. 

Let's vote on `reset slave' resets IGNORE_SERVER_ID list;
I invited you to test your claim about `start slave' can return before
the SQL thread has started;

And I hope you and me will be more tolerant to reading somebody's
comments. A motto for us as a team should be - try to understand.
I personally find more productive to call a guy once I am in such a
place that i need immediate help to understand, instead of mailing
that causes interruption.
I believe, if there is a roughness in comments, a clever one (which we
always are :-) will find a clue effortlessly from the source code in
most of the cases.

cheers,

Andrei
Thread
bk commit into 6.0 tree (aelkin:1.2634) BUG#25998Andrei Elkin8 May 2008
  • Re: bk commit into 6.0 tree (aelkin:1.2634) BUG#25998Mats Kindahl20 May 2008
    • Re: bk commit into 6.0 tree (aelkin:1.2634) BUG#25998Andrei Elkin22 May 2008
      • Re: bk commit into 6.0 tree (aelkin:1.2634) BUG#25998Mats Kindahl28 May 2008
        • Re: bk commit into 6.0 tree (aelkin:1.2634) BUG#25998Andrei Elkin30 May 2008
          • Re: bk commit into 6.0 tree (aelkin:1.2634) BUG#25998Mats Kindahl2 Jun 2008
Re: bk commit into 6.0 tree (aelkin:1.2634) BUG#25998Andrei Elkin3 Jun 2008
  • Re: bk commit into 6.0 tree (aelkin:1.2634) BUG#25998Mats Kindahl3 Jun 2008