List:Commits« Previous MessageNext Message »
From:Luís Soares Date:January 28 2010 2:59pm
Subject:Re: bzr commit into mysql-5.1-rep-semisync branch (Li-Bing.Song:3127)
Bug#50157
View as plain text  
Hi Libing,
  Nice Work. 
  Please find my review comments below.


Regards,
Luís

STATUS
------
 
  Not Approved. (Requested some changes and clarification).

REQUIRED CHANGES
----------------

  RC1. Please, improve the comments. See comments inline.
 
  RC2. Please, make the comments doxygen style compliant.

REQUESTS
--------

  R1. Let me see if I got this straight. This is about memory
      management. You use the following check you use in
      free_nodes_before:

        ((&(block->nodes[0]) <= node) && 
         (&(block->nodes[BLOCK_TRANX_NODES]) >= node))

      If a transaction requires more than BLOCK_TRANX_NODES, then
      it needs two blocks. If at that time,
      free_nodes_before(nodeY) gets called, and we are in the
      following scenario:
 
        1. nodeY is logically after nodeX;
        2. &nodeY is in the second block;
        3. and malloc has assigned the memory for this second
           block from some chunk that was freed before and was
           lying around in its list of free chunks (this is a
           possible implementation of malloc - glibc uses it I
           think);
        4. and the first block was not taken from the list of
           free memory chunks (meaning that it required a call to
           brk, for instance).

      Wouldn't this mean that for nodeY (which is logically after
      nodeX), would be considered before nodeX? Wouldn't this
      mess things up?

SUGGESTIONS
-----------

  S1. In sql/rpl_handler.cc, maybe we could avoid the changes
      regarding the initialization for Trans_param if we have a
      default constructor for the structure in sql/replication.h

       typedef struct Trans_param {

         (...)

         Trans_param() : server_id(0), 
           flags(0), log_file(NULL), log_pos(0) {};

         (...)

       } Trans_param;

      I think that, yet another different approach, with less
      changes, would be:

        -  Trans_param param;
        +  Trans_param param= {0};

     Have you considered? Your changes are OK, just asking
     because in most places three out of four Trans_param members
     are initialized and only if FOREACH_OBSERVER is called the
     server_id member is initialized... Just feels like a loose
     end... Anyway, nothing to be worried about... Your changes
     are also fine. I leave it up to your consideration!

DETAILS 
-------
 See more comments inline.


On Wed, 2010-01-27 at 07:49 +0000, Li-Bing.Song@stripped wrote:
> #At file:///home/anders/work/bzrroot/mysql-5.1-rep-semisync/ based on
> revid:zhenxing.he@stripped
> 
>  3127 Li-Bing.Song@stripped	2010-01-27
>       BUG#50157 Assertion !active_tranxs_->is_tranx_end_pos(..) in
> ReplSemiSyncMaster::commitTrx
>       
>       The root cause of the crash is that a TranxNode is freed before it is used.
>       A TranxNode is allocated and insertted into the active list each time 
>       when some log events are written into binlog file and is flushed. 

suggestion: each time a log event is written and flushed into 
            the binlog file.

>       The memory for TranxNode is allocted with thd_alloc and will be freed 

typo: allocted -> allocated

>       after at the end of the statement. The after_commit/after_rollback callback

suggestion: at the end of the statement

>       was supposed to be call before the end of each statement and remove the node
> from

to be called 

>       the active list. However this assumption is not correct in all cases(e.g.
> CREATE
>       TEMPORARY ... SELECT), and can cause the memory allocated for TranxNode be
> freed
>       before it was removed from the active list. So The TranxNode pointer in the
> active
>       list would become a wild pointer and cause the crash.

I would probably improve a bit the problem description... See 
my question below.

Question: Is the after_commit/after_rollback hook called at 
          all in CREATE TEMPORARY ... SELECT ? If so, when and 
          why does it differ? 

>       After this patch, The memory for TranxNode will be allocated by my_malloc.

If I understood correctly, the patch introduces a memory manager 
for TranxNode which is backed up by my_malloc.

>      @ sql/rpl_handler.cc
>         params are not initialized.
> 
>     modified:
>       mysql-test/suite/rpl/r/rpl_semi_sync.result
>       mysql-test/suite/rpl/t/rpl_semi_sync.test
>       plugin/semisync/semisync_master.cc
>       plugin/semisync/semisync_master.h
>       sql/rpl_handler.cc

[snip]

> === modified file 'plugin/semisync/semisync_master.h'
> --- a/plugin/semisync/semisync_master.h	2009-12-04 01:46:33 +0000
> +++ b/plugin/semisync/semisync_master.h	2010-01-27 07:48:44 +0000
> @@ -20,6 +20,196 @@
>  
>  #include "semisync.h"
>  
> +struct TranxNode {
> +  char             log_name_[FN_REFLEN];
> +  my_off_t          log_pos_;
> +  struct TranxNode *next_;            /* the next node in the sorted list */
> +  struct TranxNode *hash_next_;    /* the next node during hash collision */
> +};
> +
> +/*
> +  TranxNodeAllocator provides memory allocating and freeing methods for
> +  TranxNode.
> +
> +  BLOCK_TRANX_NODES
> +    The number of TranxNodes which are in a block.

Maybe move the definition of Block before this one, so that
BLOCK_TRANX_NODES can be defined on something that is 
already known.

> +  Block
> +    A sequence memory which contain BLOCK_TRANX_NODES TranxNodes.
> +    The Block link table is composed of number of Blocks.
> +  last_node
> +    it always points to the last node in current_block which is in use.

Same with current_block and last_node.

> +  current_block
> +    Part of its nodes are free and others are used. The Blocks before it are
> +    all in use and the blocks after it are all free.
> +  allocate_node
> +    The pointer of the first free node in current_block is returned to the
> +    caller. current_block will move to the next free block when all nodes of
> +    it are in use. A new block is allocated and is put into the rear of the
> +    Block link table if no block is free. At first, there is no any Block in
> +    the Block link table. A Block is allocated and is put into the Block link
> +    table;

suggestion: Replace: "At first ..." with "The list starts up empty
            (ie, there is no allocated Block)."

> +  free_nodes_before(TranxNode *node)
> +    All blocks before the block which owneres the node gaving by the caller are

typo: owneres, gaving

> +    put into the rear of the Block link table. The block containing
> +    the node gaving by the caller is not move to the rear of the Block link
> +    table, for some nodes of its are still in use. We only say a block is free
> +    if all its node are free. This will waste at most one block, but it is more
> +    efficient.

OK, I'm having a hard time to read this. Can you make it simpler?
Can you explain more ? I think I get the picture, but we need to
make this comment more clear.

Suggestion: A block is said to be free if all its nodes are free. As
            such, all blocks before the given one ("node") are moved
            into the rear of the block link table. The block containing
            "node", however, is not. It may still have some other nodes
            in use. This will waste at most one block, but it is more
            efficient.

> + */
> +#define BLOCK_TRANX_NODES 16
> +class TranxNodeAllocator
> +{
> +public:
> +  /*
> +    init_nodes
> +      How many TranxNodes are usually need.
> +    max_reserved_block_num
> +      How many blocks are need to cantain the TranxNodes of init_nodes.

typo: cantain -> contain

> +      Some free Blocks will be freed if there are too many Blocks in the
> +      Block link table.
> +   */
> +  TranxNodeAllocator(uint init_nodes) :
> +    max_reserved_block_num(init_nodes/BLOCK_TRANX_NODES +
> +                  (init_nodes % BLOCK_TRANX_NODES > 1 ? 2 : 1)),
> +    last_node(-1), first_block(NULL), last_block(NULL),
> +    current_block(NULL), block_num(0) {}
> +
> +  ~TranxNodeAllocator()
> +  {
> +    Block *block= first_block;
> +    while (block != NULL)
> +    {
> +      Block *next= block->next;
> +      free_block(block);
> +      block= next;
> +    }
> +  }
> +
> +  TranxNode *allocate_node()
> +  {
> +    TranxNode *trx_node;
> +    Block *block= current_block;
> +
> +    if (last_node == BLOCK_TRANX_NODES-1)
> +    {
> +      current_block= current_block->next;
> +      last_node= -1;
> +    }
> +
> +    if (current_block == NULL && allocate_block())
> +    {
> +      current_block= block;
> +      if (current_block)
> +        last_node= BLOCK_TRANX_NODES-1;
> +      return NULL;
> +    }
> +
> +    trx_node= &(current_block->nodes[++last_node]);
> +    trx_node->log_name_[0] = '\0';
> +    trx_node->log_pos_= 0;
> +    trx_node->next_= 0;
> +    trx_node->hash_next_= 0;
> +    return trx_node;
> +  }
> +
> +  int free_all_nodes()
> +  {
> +    current_block= first_block;
> +    last_node= -1;
> +    free_blocks();
> +    return 0;
> +  }
> +
> +  int free_nodes_before(TranxNode* node)
> +  {
> +    Block *block;
> +    Block *prev_block;
> +
> +    block= first_block;
> +    while (block != current_block->next)
> +    {

Please, add some comments here, to describe the interaction between
the pointers.

> +      if (&(block->nodes[0]) <= node &&
> &(block->nodes[BLOCK_TRANX_NODES]) >= node)
> +      {
> +        if (first_block != block)
> +        {
> +          last_block->next= first_block;
> +          first_block= block;
> +          last_block= prev_block;
> +          last_block->next= NULL;
> +          free_blocks();
> +        }
> +        return 0;
> +      }
> +      prev_block= block;
> +      block= block->next;
> +    }
> +
> +    /* Node does not find should never happen */
> +    DBUG_ASSERT(0);
> +    return 1;
> +  }
> +
> +private:
> +  uint max_reserved_block_num;
> +  int last_node;
> +
> +  struct Block {
> +    Block *next;
> +    TranxNode nodes[BLOCK_TRANX_NODES];
> +  };
> +  Block *first_block;
> +  Block *last_block;
> +  Block *current_block;
> +  uint block_num;
> +
> +  int allocate_block()
> +  {
> +    Block *block= (Block *)my_malloc(sizeof(Block), MYF(0));
> +    if (block)
> +    {
> +      block->next= NULL;
> +
> +      if (first_block == NULL)
> +        first_block= block;
> +      else
> +        last_block->next= block;
> +
> +      /* New block always is put at the rear */
> +      last_block= block;
> +      /* New block always is the current_block */
> +      current_block= block;
> +      ++block_num;
> +      return 0;
> +    }
> +    return 1;
> +  }
> +
> +  void free_block(Block *block)
> +  {
> +    my_free(block, MYF(0));
> +    --block_num;
> +  }
> +
> +  void free_blocks()
> +  {
> +    if (current_block == NULL || current_block->next == NULL)
> +      return;
> +
> +    /* One free block is always kept behind the current block */
> +    Block *block= current_block->next->next;
> +    while (block_num > max_reserved_block_num && block != NULL)
> +    {
> +      Block *next= block->next;
> +      free_block(block);
> +      block= next;
> +    }
> +    current_block->next= block;

Perhaps this ^ is current_block->next->next?

> +    if (block == NULL)
> +      last_block= current_block;
> +  }
> +};
> +
> +
>  /**
>     This class manages memory for active transaction list.
>  
> @@ -31,13 +221,8 @@
>  class ActiveTranx
>    :public Trace {
>  private:
> -  struct TranxNode {
> -    char             log_name_[FN_REFLEN];
> -    my_off_t          log_pos_;
> -    struct TranxNode *next_;            /* the next node in the sorted list */
> -    struct TranxNode *hash_next_;    /* the next node during hash collision */
> -  };
>  
> +  TranxNodeAllocator allocator_;
>    /* These two record the active transaction list in sort order. */
>    TranxNode       *trx_front_, *trx_rear_;
>  
> @@ -48,24 +233,22 @@ private:
>  
>    inline void assert_lock_owner();
>  
> -  inline TranxNode* alloc_tranx_node();
> -
>    inline unsigned int calc_hash(const unsigned char *key,unsigned int length);
>    unsigned int get_hash_value(const char *log_file_name, my_off_t log_file_pos);
>  
>    int compare(const char *log_file_name1, my_off_t log_file_pos1,
> -	      const TranxNode *node2) {
> +              const TranxNode *node2) {
>      return compare(log_file_name1, log_file_pos1,
> -		   node2->log_name_, node2->log_pos_);
> +                   node2->log_name_, node2->log_pos_);
>    }
>    int compare(const TranxNode *node1,
> -	      const char *log_file_name2, my_off_t log_file_pos2) {
> +              const char *log_file_name2, my_off_t log_file_pos2) {
>      return compare(node1->log_name_, node1->log_pos_,
> -		   log_file_name2, log_file_pos2);
> +                   log_file_name2, log_file_pos2);
>    }
>    int compare(const TranxNode *node1, const TranxNode *node2) {
>      return compare(node1->log_name_, node1->log_pos_,
> -		   node2->log_name_, node2->log_pos_);
> +                   node2->log_name_, node2->log_pos_);
>    }
>  
>  public:
> @@ -88,7 +271,7 @@ public:
>     *  0: success;  non-zero: error
>     */
>    int clear_active_tranx_nodes(const char *log_file_name,
> -			       my_off_t    log_file_pos);
> +                               my_off_t    log_file_pos);
>  
>    /* Given a position, check to see whether the position is an active
>     * transaction's ending position by probing the hash table.
> @@ -99,7 +282,7 @@ public:
>     * (file_name, file_position).
>     */
>    static int compare(const char *log_file_name1, my_off_t log_file_pos1,
> -		     const char *log_file_name2, my_off_t log_file_pos2);
> +                     const char *log_file_name2, my_off_t log_file_pos2);
>  
>  };
>  
> 
> === modified file 'sql/rpl_handler.cc'
> --- a/sql/rpl_handler.cc	2009-12-04 01:46:33 +0000
> +++ b/sql/rpl_handler.cc	2010-01-27 07:48:44 +0000
> @@ -190,8 +190,8 @@ int Trans_delegate::after_commit(THD *th
>  {
>    Trans_param param;
>    bool is_real_trans= (all || thd->transaction.all.ha_list == 0);
> -  if (is_real_trans)
> -    param.flags |= TRANS_IS_REAL_TRANS;
> +
> +  param.flags = is_real_trans ? TRANS_IS_REAL_TRANS : 0;

See S1. for this ^ .

>    Trans_binlog_info *log_info=
>      my_pthread_getspecific_ptr(Trans_binlog_info*, RPL_TRANS_BINLOG_INFO);
> @@ -218,8 +218,8 @@ int Trans_delegate::after_rollback(THD *
>  {
>    Trans_param param;
>    bool is_real_trans= (all || thd->transaction.all.ha_list == 0);
> -  if (is_real_trans)
> -    param.flags |= TRANS_IS_REAL_TRANS;
> +
> +  param.flags = is_real_trans ? TRANS_IS_REAL_TRANS : 0;
>  
>    Trans_binlog_info *log_info=
>      my_pthread_getspecific_ptr(Trans_binlog_info*, RPL_TRANS_BINLOG_INFO);
> @@ -228,7 +228,7 @@ int Trans_delegate::after_rollback(THD *
>    param.log_pos= log_info ? log_info->log_pos : 0;
>  
>    int ret= 0;
> -  FOREACH_OBSERVER(ret, after_commit, thd, (&param));
> +  FOREACH_OBSERVER(ret, after_rollback, thd, (&param));
>  
>    /*
>      This is the end of a real transaction or autocommit statement, we
> 


Thread
bzr commit into mysql-5.1-rep-semisync branch (Li-Bing.Song:3127) Bug#50157Li-Bing.Song27 Jan
  • Re: bzr commit into mysql-5.1-rep-semisync branch (Li-Bing.Song:3127)Bug#50157He Zhenxing28 Jan
    • Re: bzr commit into mysql-5.1-rep-semisync branch (Li-Bing.Song:3127)Bug#50157Libing Song29 Jan
      • Re: bzr commit into mysql-5.1-rep-semisync branch (Li-Bing.Song:3127)Bug#50157He Zhenxing29 Jan
  • Re: bzr commit into mysql-5.1-rep-semisync branch (Li-Bing.Song:3127)Bug#50157Luís Soares28 Jan
    • Re: bzr commit into mysql-5.1-rep-semisync branch (Li-Bing.Song:3127)Bug#50157Libing Song29 Jan
      • Re: bzr commit into mysql-5.1-rep-semisync branch (Li-Bing.Song:3127)Bug#50157Luís Soares29 Jan