List:Commits« Previous MessageNext Message »
From:Andrei Elkin Date:June 11 2009 2:08pm
Subject:Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720
View as plain text  
Zhen Xing,

> OK, I think we do not need to solve all problems right now, we only need
> to address the critical issues, and can handle other non-critical issues
> after push.

As Lars confirmed and explained why I don't object.
I am not quite sure where you are going to push, I guess that
is a staging tree, not a main tree.

In the following text I annotated with [estimate] my findings.
I list to the critical also problems dealing with the user interface.

>
>> I would rather value 1,3 as important to fix, although not critical.
>> 2 is not critical and might be the beginning of a discussion.
>> 
>> 
>> 
>> 1. status vs variables name convention on the master:
>> 
>>    E.g Rpl_semi_sync_master_status  rpl_semi_sync_master_enabled 
>> 
>>
...
> The convention for MySQL is that system variables are in lower-case, and
> for status variables, the first letter is in upper-case. I think we
> should follow that convention.

ok, thanks Paul too.

>
> About the use prefix 'rplug_' or 'rpl_plug_', I think it's not necessary
> to mention 'plug' in the name, because:
>  1) we have storage plug-ins, and do not have 'plug' in those variable
> names
>  2) I don't think we need to distinct variables defined by plug-in and
> sever kernel. If we later move a feature out of the kernel and make it a
> plug-in, I don't think we should also change the variable names.

there is sense in that too. I would agree with your arguments.
That means an option to provide info on the origin of a
variable/status should be implemented somehow differently than by
a name convention.

>
>> 2. rpl_semi_sync_master_timeout ==  0    | > 0   could mean 
>>    rpl_semi_sync_master_enabled == FALSE | TRUE
>> 
>
> I think using two variable is more explicit.
>
>>    and the slave's intent to connect semi-syncronously to be
>>    express via CHANGE MASTER's new parameter having the meaning
>>    of `rpl_semi_sync_slave_enabled'.
>>   
>
> I don't think these components added any new parameter for CHANGE
> MASTER.

Ok, I would oppose myself wrt this idea to have 
a policy of forcing the session to stay in one of
START SLAVE semi_sync = true | [false, default].
And the reason is the semi-synchronous communication
is allowed to switch on/off during a single master-slave session
(at least automatically, see below).
Maybe that makes sense being more flexible.

>
>>    That would make `rpl_semi_sync_{master,slave}_enabled' unnecessary
>>    (and therefore release from a dependency like
>>      rpl_semi_sync_master_timeout ==  0  => rpl_semi_sync_master_enabled ==
> FALSE
>>    (Just have tested: timeout == 0 and enable == TRUE can be at the
>>    same time).
>> 
>
> Maybe we could make rpl_semi_sync_master_timeout == 0 means wait for
> ever.

Well, that would be *over*-stretching. I dare to claim semantics of
*all* timeout == 0 is the same across the mysql's code. 
As to forever - a typical solution is to reserve a magic
value in the range representing a type. It may not be zero for
timeout, under all my respect to your imagination :-).

[critical]

More important, while semi/2 logics does not think on 

   rpl_semi_sync_master_timeout ==  0 

as to wait forever

The setting

   set global rpl_semi_sync_master_timeout= 0

should force 

   Rpl_semi_sync_master_status == FALSE


>
>> 3. Attempt to set an out-of-range value succeeds.
>> 
>>    set global rpl_semi_sync_master_timeout= -1;
>> 
>>    show warnings >
>>      Warning | 1292 | Truncated incorrect timeout value: '18446744073709551615'
>> 
>>    show variables like >
>>       rpl_semi_sync_master_timeout            | 4294967295 
>> 
>>    This might be a separate and rather general issue as many other
>>    variables tolerate an out-of-range assignments. However when some
>>    change the current value to accept an incorrect (the out-of-range)
>>    that is an issue.
>>    
>>    I think there should be an error thrown as out-of-range input
>>    value, and at least not to accept an incorrect value.
>>    It's safer to be conservative to assume Warnings are not always read.
>>  
>>    The same applies 
>>    rpl_semi_sync_slave_trace_level, rpl_semi_sync_master_trace_level
>>    that must accept only positive integer values.
>> 
>
> Yes, this is a problem, but since it's a general problem, I think it
> should be handled as a server bug.
>
>> 4.  It makes sense to group the two members of 
>>  
>>     struct TranxNode {
>> 
>>        char             *log_name_;
>>        my_off_t          log_pos_;
>> 
>>     into an existing in 6.0 struct event_coordinates (added by hb)
>>     because the two go together most of the time, which is naturally.
>> 
>
> These components are supposed to be independent from server, and should
> not use any server internal structures(THD, MASTER_INFO, etc). only the
> replication interface defined by WL#4398, interfaces defined in plugin.h
> and cient API should be used.

I believe it's an over-bend.
I read it `to be independent' in the functional sense, not in the
build sense. Nothing prevents a plugin writer (unless he's 
a some other db addict hating our code) to use the server's
sources. 
Consider the client programs, they are far stronger
separated from the server, as an example.

>
>> 
>> 5.  A first look on testing makes me wondering if ratio of the new
>>     feature code lines per the only test is fine... 
>>     Sorry, that does not sound like creative, I'll figure out the rest
>>     tomorrow.
>> 
>
> You're right, that is only a simple feature test, and WL#4873 is
> supposed to test this more completely.

I fear that is not pushed.
If that's true, and per Lars explanations the current patch is for QA 
I wonder if QA won't force us to push the testing WL too.

>
>> 
>> Open issues:
>> 
>> O1. How about rpl_semi_sync.test with engine= FALCON?
>
> I have not did a lot of test, the patch is supposed to be engine
> independent.

[not critical]

It is supposed, correct.
Although I don't expect Falcon to create a serious issue,
still, minimal efforts to create a test for a
2nd engine may reward in less wasted time for bug-fixing.



>
>> O2. I think there should be some interleaving with the repl heartbeat;
>>     Have you thought on any compliction in this area? I need some more
>>     time of digging into the sync/2 impl to be sure everything is
>>     safe.
>> 
>
> I don't see any problem with rpl hb, the event generated by hb is not
> written into binary log, and will not cause a wait on master and there
> will be not reply from slave for hb events.

Ok.


[critical]

Since semi/2 intervenes in the delicate commit procedure

   ha_commit_trans ()

       DBUG_EXECUTE_IF("crash_commit_after", DBUG_ABORT(););
       RUN_HOOK(transaction, after_commit, (thd, all));

i suggest to swap the two lines to extend the after-commit crash
to semi/2.
The idea is that w/o the semi/2 plugged-in, the test would be same as
before, with the plugin we may see something different.


[not critical]

N1. show processlist does not display the slave side semi/2 thread as 
    does so for the master's.

[critical]
       
N2. it is not clear what is the extent of keeping the ON-semi/2-status
    by master and slave. Per my experiments the status is persistent
    during the session time. It flips over,  e.g 1 -> 0, in doing 

     set global rpl_semi_sync_slave_enabled = 0; stop slave;

    Without `slave slave' it does not change.
    However it can change automatically because of a transaction times
    out. Thefore, I expect `set global rpl_semi_sync_slave_enabled =  0'
    to have an immediate effect.
    
    If it could not be done, it should be either
    documented, but ideally the flag would be turned into the option
    disccussed above: START SLAVE semi_sync=on|off to relax us from
    explanation why the status is OFF whereas the corresponding
    flag is ON (and maybe vice versa).

As I am limited with time in doing this review, here is the final 

[ critical ]

I second Alfranio's concern about an access to non-volatile object
outside of its mutex' scope.


cheers,

Andrei









Thread
bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720He Zhenxing31 May
  • Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720Alfranio Correia7 Jun
    • Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720Alfranio Correia9 Jun
      • Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720Alfranio Correia10 Jun
        • Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720He Zhenxing10 Jun
          • Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720He Zhenxing11 Jun
    • Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720He Zhenxing10 Jun
      • Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720Alfranio Correia10 Jun
        • Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720Mats Kindahl10 Jun
          • Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720Alfranio Correia10 Jun
            • Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720Mats Kindahl10 Jun
              • Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720Alfranio Correia10 Jun
                • Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720Mats Kindahl10 Jun
  • Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720Andrei Elkin10 Jun
    • Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720Paul DuBois10 Jun
      • Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720He Zhenxing11 Jun
    • Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720He Zhenxing11 Jun
      • Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720Andrei Elkin11 Jun
        • Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720He Zhenxing11 Jun
          • Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720Andrei Elkin11 Jun
            • Re: bzr commit into mysql-6.0-rpl branch (zhenxing.he:2858) WL#1720He Zhenxing12 Jun