Mats,
Thank you very much for the insightful explanation.
See just a comment/question in-line.
Mats Kindahl wrote:
> Hi!
>
> Alfranio Correia wrote:
>
>> Hi Jasonh,
>>
>> Thank you for the reply.
>> See some comments in-line.
>>
>> Cheers.
>>
>> He Zhenxing wrote:
>>
>>> Hi Alfranio,
>>>
>>> Thank you for the review!
>>>
>>> Alfranio Correia wrote:
>>>
>>>
>>>
>>>>>
>>>>>
>>>>>
>>>> I haven't followed the comment.
>>>> Can you elaborate more on that?
>>>>
>>>>
>>> I had to say I'm not quite sure about the comment either, I guess it
>>> means that the value for system variables will be initialized after
>>> set_options(), I'll check more on that.
>>>
>>>
>>>
>> ok.
>>
>>>>> +
>>>>> + /* Mutex initialization can only be done after MY_INIT(). */
>>>>> + pthread_mutexattr_t mutexattr;
>>>>> + pthread_mutexattr_init(&mutexattr);
>>>>> + pthread_mutexattr_settype(&mutexattr,
>>>>> + PTHREAD_MUTEX_FAST_NP);
>>>>> +// pthread_mutex_init(&LOCK_binlog_, MY_MUTEX_INIT_FAST);
>>>>> + pthread_mutex_init(&LOCK_binlog_, &mutexattr);
>>>>>
>>>>>
>>>>>
>>>>> +{
>>>>> + const char *kWho = "ReplSemiSyncMaster::reportReplyBinlog";
>>>>> + int cmp;
>>>>> + bool can_release_threads = false;
>>>>> + bool need_copy_send_pos = true;
>>>>>
>>>>>
>>>>>
>>>> Why are you checking the variable before acquiring a lock?
>>>>
>>>>
>>> This so called Double-check-locking, can avoid to acquire the lock if
>>> the value is false
>>>
>>> http://en.wikipedia.org/wiki/Double_checked_locking_pattern
>>>
>>>
>> Ok. I did not know this pattern. However, I've checked the
>> Schmidt, D et al. Pattern-Oriented Software Architecture Vol 2, 2000
>> pp353-363
>> and it seems that there are problems with some platforms and compiler
>> optimizations.
>>
>> 1 - First make the variable volatile to avoid caching it in a register.
>> 2 - Check with Mats if MySQL runs on Compaq Alpha or Intel Itanium. In
>> such platforms,
>> if the cpu cache is not properly flushed if shared data is not accessed
>> without locks.
>>
>
>
Fix: if the cpu cache is not properly flushed if shared data *is accessed*
without locks.
> I am pretty sure that we support Intel Itanium. Regardless, I think that we have
> to do that. Alpha is less of a problem, since it is very rare.
>
> However, the double-checked locking pattern does not work in C++ because the
> compiler is allowed to do optimizations assuming that only a single thread is
> executing. This means that the construction of an object is generally not
> linearizable, meaning that is has a "initialization period" and does not appear
> as if it is atomic. The code you have is different, so I take the canonical example:
>
> extern Object *ptr;
> ...
> if (ptr == 0) {
> lock();
> if (ptr == 0) {
> ptr = new Object;
> ...
> }
> }
>
Should such optimizations be avoided by the use of the modifier volatile?
> What normally happens in the compiler is that memory is allocated and assigned
> to the variable in question, after which the constructor is called. This means
> that another thread that starts reading the same pointer, may discover a pointer
> that is initialized and will therefore read uninitialized memory. In other
> words, the compiler can compile the code above behaves as if it was written the
> following way:
>
> extern Object *ptr;
> ...
> if (ptr == 0) {
> lock();
> if (ptr == 0) {
> ptr = ::operator new(sizeof(Object));
> new (ptr) Object;
> ...
> }
> }
>
> Note that this is not a rare case, but quite commonly used by compilers. In
> general, I would not trust double-checked locking as a technique without special
> support from the compiler (e.g., by using GCC:s intrinsic function
> __compare_and_swap(), and even then one have to be careful and know what a
> compiler can do with the code).
>
> I think this will change with C++0x, but I haven't considered this case with the
> new semantics yet...
>
> /Matz
>
>
>>
>>>>>
>>>>>
>>>>>
>>>> Can you explain the variable below?
>>>>
>>>>
>>>>> + unsigned long wait_backtraverse_; /* wait
> position back traverses */
>>>>>
>>>>>
>>> usually the position waiting for should increase, this variable will
>>> record the times that the position to waiting for reply decreases,that
>>> is the later written binlog has a smaller position.
>>>
>>> I have not check if this can happen, probably can be caused by InnoDB
>>> engine.
>>>
>>>
>>>
>> Maybe due to the internal 2PC, when the XID does not get into the binary
>> log.
>> Try to create a test case for this, maybe using the following fault
>> injection
>> string "crash_before_writing_xid" (check log.cc:6340).
>>
>>
>
>