List:Commits« Previous MessageNext Message »
From:Rafal Somla Date:September 10 2007 5:38pm
Subject:Re: bk commit into 5.1 tree (cbell:1.2570) BUG#30790
View as plain text  
Hi Chuck,

I'm reading the code for packing/unpacking table column information in 
log_event.{h,cc} and rpl_utility.{h,cc} and this is how I understand it:

1. Information is stored inside Table_map event in a fixed byte-order as defined 
by Table_map_log_event::save_field_metadata().

2. In slave, this data is copied to m_field_metadata member of Table_map_event 
as it is. Then it is parsed in the constructor of table_def class which fills
table_def::m_field_metadata member.

3. The table_def::m_field_metadata member is declared as an array of uint16 with 
the intention that a single entry can contain either a (decoded) length of a 
field or, for some field types, length together with other info in the high and 
low byte (e.g. size and precision for decimals).

If I were writing this code, I would make the design in point 3 more explicit by 
declaring m_field_metadata as an array of unions, like this:

union field_type_info {
  uint16  length;
  struct
  {
    uchar decimals;
    uchar precision;
  } new_decimal;
  struct
  {
    uchar type;
    uchar length;
  } set_or_enum;
  ...
};

Then the code in table_def::calc_field_size could look like this:

uint32 table_def::calc_field_size(uint col, uchar *master_data) const
{
   uint32 length;

   field_type_info *info= &m_field_metadata[col];

   switch (type(col)) {
   case MYSQL_TYPE_NEWDECIMAL:
     length= my_decimal_get_binary_size(info->new_decimal.decimals,
                                        info->new_decimal.precision);
     break;
   case MYSQL_TYPE_DECIMAL:
   case MYSQL_TYPE_FLOAT:
   case MYSQL_TYPE_DOUBLE:
     length= info->length;
     break;
   case MYSQL_TYPE_SET:
     DBUG_ASSERT(info->set_or_enum.type == MYSQL_TYPE_SET);
     length= info->set_or_enum.length;
     break;

   <... and so on>

Even if you don't want to change the code as I suggested, please fix the 
following places in the current one.

In table_def constructor
------------------------

>   table_def(field_type *types, ulong size, uchar *field_metadata, 
>       int metadata_size, uchar *null_bitmap)
>     : m_size(size), m_type(0),
>       m_field_metadata(0), m_null_bits(0), m_memory(NULL)
>   {
>     m_memory= (uchar *)my_multi_malloc(MYF(MY_WME),
>                                        &m_type, size,
>                                        &m_field_metadata,
>                                        size * sizeof(uint16),
>                                        &m_null_bits, (size + 7) / 8,
>                                        NULL);
> 
>     bzero(m_field_metadata, size * sizeof(uint16));
> 
>     if (m_type)
>       memcpy(m_type, types, size);
>     else
>       m_size= 0;
>     /*
>       Extract the data from the table map into the field metadata array
>       iff there is field metadata. The variable metadata_size will be
>       0 if we are replicating from an older version server since no field
>       metadata was written to the table map. This can also happen if 
>       there were no fields in the master that needed extra metadata.
>     */
>     if (m_size && metadata_size)
>     { 
>       int index= 0;
>       for (unsigned int i= 0; i < m_size; i++)
>       {
>         switch (m_type[i]) {
>         case MYSQL_TYPE_TINY_BLOB:
>         case MYSQL_TYPE_BLOB:
>         case MYSQL_TYPE_MEDIUM_BLOB:
>         case MYSQL_TYPE_LONG_BLOB:
>         case MYSQL_TYPE_DOUBLE:
>         case MYSQL_TYPE_FLOAT:
>         {
>           /*
>             These types store a single byte.
>           */
>           m_field_metadata[i]= (uchar)field_metadata[index];

Cast not needed, field_metadata[index] *is* of type uchar. If any, the cast 
should be to uint16 (but this is implicit).

>           index++;
>           break;
>         }
>         case MYSQL_TYPE_SET:
>         case MYSQL_TYPE_ENUM:
>         case MYSQL_TYPE_STRING:
>         {
>           short int x= field_metadata[index++] << 8U; // real_type
>           x = x + field_metadata[index++];            // pack or field length

x should be declared unsigned since it is used as a two 8-bit values here. Why 
short? - it only makes it look more dangerous. Best declare it as uint16, i.e. 
the target type.

I would write "x = x | (0xFF & field_metadata[index++])" to make it more 
explicit what is going on here.

>           m_field_metadata[i]= x;

(
If my suggestion was implemented, this code would look as:
   m_field_metadata[i].set_or_enum.type = field_metadata[index++];
   m_field_metadata[i].set_or_enum.length = field_metadata[index++];
)

>           break;
>         }
>         case MYSQL_TYPE_BIT:
>         {
>           short int x= field_metadata[index++]; 
>           x = x + (field_metadata[index++] << 8U);
>           m_field_metadata[i]= x;

As above.

>           break;
>         }
>         case MYSQL_TYPE_VARCHAR:
>         {
>           /*
>             These types store two bytes.
>           */
>           char *ptr= (char *)&field_metadata[index];
>           m_field_metadata[i]= sint2korr(ptr);

I would use uint2korr here and add asserts in 
Table_map_log_event::save_field_metadata() checking that a negative value is 
never saved in the event.

If, for some reason, we allow negative values to be stored here, the code should 
look as follows:

            char *ptr= (char *)&field_metadata[index];
            int16 x= sint2korr(ptr);
            DBUG_ASSERT(x >= 0);
            m_field_metadata[i]= x;

This is because m_field_metadata[i] is unsigned and you should never store a 
negative value in it.

>           index= index + 2;
>           break;
>         }
>         case MYSQL_TYPE_NEWDECIMAL:
>         {
>           short int x= field_metadata[index++] << 8U; // precision
>           x = x + field_metadata[index++];            // decimals
>           m_field_metadata[i]= x;

As above, declare x to be uint16.

>           break;
>         }
>         default:
>           m_field_metadata[i]= 0;
>           break;
>         }
>       }
>     }

In calc_field_size()
--------------------

> uint32 table_def::calc_field_size(uint col, uchar *master_data) const
> {
>   uint32 length;
> 
>   switch (type(col)) {
>   case MYSQL_TYPE_NEWDECIMAL:
>     length= my_decimal_get_binary_size(m_field_metadata[col] >> 8, 
>              m_field_metadata[col] - ((m_field_metadata[col] >> 8) <<
> 8));

Why "m_field_metadata[col] - ((m_field_metadata[col] >> 8) << 8)"? I thinks
this 
is the same as "m_field_metadata[col]&0xFF" and the latter is much easier to 
understand.

>     break;
>   case MYSQL_TYPE_DECIMAL:
>   case MYSQL_TYPE_FLOAT:
>   case MYSQL_TYPE_DOUBLE:
>     length= m_field_metadata[col];
>     break;
>   case MYSQL_TYPE_SET:
>   case MYSQL_TYPE_ENUM:
>   case MYSQL_TYPE_STRING:
>   {
>     if (((m_field_metadata[col] & 0xff00) == (MYSQL_TYPE_SET << 8)) ||
>         ((m_field_metadata[col] & 0xff00) == (MYSQL_TYPE_ENUM << 8)))

Or, "enum_field_types type= m_field_metadata[col]>>8U; if (type==MYSQL_TYPE_SET 
|| type==MYSQL_TYPE_ENUM)" - saves few bit manipulations.

>       length= m_field_metadata[col] & 0x00ff;
>     else

Please put this else branch as separate case MYSQL_TYPE_STRING. It has virtually 
no common code with the preceding two cases, especially after your patch which 
removes "& 0x00ff" below.

>     {
>       length= m_field_metadata[col] & 0x00ff;
>       DBUG_ASSERT(length > 0);

I would change this assert to "DBUG_ASSERT(length != 0)". Since length is 
unsigned above formulation is confusing.

>       if (length > 255)
>       {
>         DBUG_ASSERT(uint2korr(master_data) > 0);
>         length= uint2korr(master_data) + 2;

To avoid double call to uint2korr:

       length= uint2korr(master_data) + 2;
       DBUG_ASSERT(length > 2);

(actually, one call is inside assert, but anyway...)

>       }
>       else
>         length= (uint) *master_data + 1;
>     }

Could you add a comment explaining why length is read twice here. As we 
discussed, the value from m_field_metadata[col] could be the maximal length (as 
declared in table definition), while the value read from master_data, the actual 
length of the field.

>     break;
>   }
>   case MYSQL_TYPE_YEAR:
>   case MYSQL_TYPE_TINY:
>     length= 1;
>     break;
>   case MYSQL_TYPE_SHORT:
>     length= 2;
>     break;
>   case MYSQL_TYPE_INT24:
>     length= 3;
>     break;
>   case MYSQL_TYPE_LONG:
>     length= 4;
>     break;
> #ifdef HAVE_LONG_LONG
>   case MYSQL_TYPE_LONGLONG:
>     length= 8;
>     break;
> #endif
>   case MYSQL_TYPE_NULL:
>     length= 0;
>     break;
>   case MYSQL_TYPE_NEWDATE:
>     length= 3;
>     break;
>   case MYSQL_TYPE_DATE:
>   case MYSQL_TYPE_TIME:
>     length= 3;
>     break;
>   case MYSQL_TYPE_TIMESTAMP:
>     length= 4;
>     break;
>   case MYSQL_TYPE_DATETIME:
>     length= 8;
>     break;
>   case MYSQL_TYPE_BIT:
>   {
>     uint from_len= (m_field_metadata[col] >> 8U) & 0x00ff;
>     uint from_bit_len= m_field_metadata[col] & 0x00ff;
>     DBUG_ASSERT(from_bit_len <= 7);
>     length= from_len + ((from_bit_len > 0) ? 1 : 0);

Could you add some explanatory comment (I think I understand what is going on 
here but would be good to explain it in a comment). For example document 
from_len and from_bit_len variables.

>     break;
>   }
>   case MYSQL_TYPE_VARCHAR:
>   {
>     length= m_field_metadata[col] > 255 ? 2 : 1; // c&p of
> Field_varstring::data_length()
>     DBUG_ASSERT(uint2korr(master_data) > 0);
>     length+= length == 1 ? (uint32) *master_data : uint2korr(master_data);
>     break;

Looks the same as MYSQL_TYPE_STRING - why not handle them together.

>   }
>   case MYSQL_TYPE_TINY_BLOB:
>   case MYSQL_TYPE_MEDIUM_BLOB:
>   case MYSQL_TYPE_LONG_BLOB:
>   case MYSQL_TYPE_BLOB:
>   case MYSQL_TYPE_GEOMETRY:
>   {
> #if 1
>     /*
>       BUG#29549: 
>       This is currently broken for NDB, which is using big-endian
>       order when packing length of BLOB. Once they have decided how to
>       fix the issue, we can enable the code below to make sure to
>       always read the length in little-endian order.
>     */
>     Field_blob fb(m_field_metadata[col]);
>     length= fb.get_packed_size(master_data, TRUE);
> #else
>     /*
>       Compute the length of the data. We cannot use get_length() here
>       since it is dependent on the specific table (and also checks the
>       packlength using the internal 'table' pointer) and replication
>       is using a fixed format for storing data in the binlog.
>     */
>     switch (m_field_metadata[col]) {
>     case 1:
>       length= *master_data;
>       break;
>     case 2:
>       length= sint2korr(master_data);
>       break;
>     case 3:
>       length= uint3korr(master_data);
>       break;
>     case 4:
>       length= uint4korr(master_data);
>       break;
>     default:
>       DBUG_ASSERT(0);		// Should not come here
>       break;

Strange that sint*korr is sometimes used and sometimes uint*korr. Shouldn't it 
be always uint*korr?

I also wonder why you sometimes use int*store/korr functions and sometimes code 
numbers "by hand". Would be more consistent to use one of these methods. Using 
int*store/korr functions has a drawback that if someone changes their 
implementation the code will break (not able to correctly read old binlogs).

>     }
> 
>     length+= m_field_metadata[col];
> #endif
>     break;
>   }
>   default:
>     length= -1;
>   }
>   return length;
> }

Best,
Rafal

cbell@stripped wrote:
> Below is the list of changes that have just been committed into a local
> 5.1 repository of cbell. When cbell does a push these changes will
> be propagated to the main repository and, within 24 hours after the
> push, to the public repository.
> For information on how to access the public repository
> see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html
> 
> ChangeSet@stripped, 2007-09-07 13:31:38-04:00, cbell@mysql_cab_desk. +2 -0
>   BUG#30790 : Suspicious code in rpl_utility.cc
>   
>   This patch clarifies some of the coding choices with documentationa and
>   removes a limitation in the code for future expansion of the CHAR and
>   BINARY fields to length > 255.
> 
>   sql/log_event.cc@stripped, 2007-09-07 13:31:26-04:00, cbell@mysql_cab_desk. +10 -0
>     BUG#30790 : Suspicious code in rpl_utility.cc
>     
>     This patch adds comments to help explain the choice of variable types.
> 
>   sql/rpl_utility.cc@stripped, 2007-09-07 13:31:26-04:00, cbell@mysql_cab_desk. +5 -1
>     BUG#30790 : Suspicious code in rpl_utility.cc
>     
>     This patch removes the mask to allow length to be > 255. This is a 
>     future expansion measure since CHAR fields are limited to 255 characters.
>     The code mirrors that of the packing methods in field.cc that are written
>     to permit > 255 characters for CHAR -type fields should that change be
>     made in the future.
> 
> diff -Nrup a/sql/log_event.cc b/sql/log_event.cc
> --- a/sql/log_event.cc	2007-08-27 14:27:42 -04:00
> +++ b/sql/log_event.cc	2007-09-07 13:31:26 -04:00
> @@ -6469,6 +6469,16 @@ void Rows_log_event::print_helper(FILE *
>    data) in the table map are initialized as zero (0). The array size is the 
>    same as the columns for the table on the slave.
>  
> +  Additionally, values saved for field metadata on the master are saved as a 
> +  string of bytes (uchar) in the binlog. A field may require 1 or more bytes
> +  to store the information. In cases where values require multiple bytes 
> +  (e.g. values > 255), the endian-safe methods are used to properly encode 
> +  the values on the master and decode them on the slave. When the field
> +  metadata values are captured on the slave, they are stored in an array of
> +  type uint16. This allows the least number of casts to prevent casting bugs
> +  when the field metadata is used in comparisons of field attributes. When
> +  the field metadata is used for calculating addresses in pointer math, the
> +  type used is uint32. 
>  */
>  
>  /**
> diff -Nrup a/sql/rpl_utility.cc b/sql/rpl_utility.cc
> --- a/sql/rpl_utility.cc	2007-08-27 07:43:58 -04:00
> +++ b/sql/rpl_utility.cc	2007-09-07 13:31:26 -04:00
> @@ -47,7 +47,11 @@ uint32 table_def::calc_field_size(uint c
>        length= m_field_metadata[col] & 0x00ff;
>      else
>      {
> -      length= m_field_metadata[col] & 0x00ff;
> +      /*
> +        Currently, MYSQL_TYPE_STRING is limited to 255 characters. The 
> +        following code is implemented to allow the expansion of that
> +        restriction.
> +      */
>        DBUG_ASSERT(length > 0);
>        if (length > 255)
>        {
> 
> 
Thread
bk commit into 5.1 tree (cbell:1.2570) BUG#30790cbell7 Sep
  • Re: bk commit into 5.1 tree (cbell:1.2570) BUG#30790Rafal Somla10 Sep