List:Commits« Previous MessageNext Message »
From:Chuck Bell Date:July 27 2007 3:39am
Subject:RE: bk commit into 5.1 tree (cbell:1.2564)
View as plain text  
Mats,  

Here are the latest patches. I had to make 2 minor corrections to the
WL#3228 patch (I failed to update the calc_field_size with the changes to
the bit::unpack() method and I made an error on one correction from the
first review). I also changed one line in the WL#3915 patch to accommodate
the renamed method.

WL#3228: http://lists.mysql.com/commits/31667
WL#3915: http://lists.mysql.com/commits/31668

The WL#3228 patch has passed all tests from the test suite. The WL#3915
patch has passed a subset of the test suite (a list of all affected tests as
shown below) I will run the full test suite on the WL#3915 patch tomorrow
while I await your review, but I am confident everything will pass as the
WL#3915 patch does not affect any other tests except the new ones.

Tests modified/affected (includes WL#3915 tests):
./mysql-test-run.pl --mysqld=--binlog-format=row --force rpl_colSize
rpl_row_max_relay_size  rpl_skip_error rpl_insert_id
rpl_switch_stm_row_mixed rpl_trigger rpl_user_variables
binlog_row_mix_innodb_myisam rpl000017 rpl_do_grant rpl_ignore_table
rpl_row_001 rpl_row_basic_11bugs rpl_row_create_table rpl_row_flsh_tbls
rpl_slave_status rpl_ndb_dd_advance rpl_ndb_log rpl_ndb_row_001 rpl_row_user
rpl_misc_functions rpl_multi_engine rpl_row_basic_8partition
rpl_row_mysqlbinlog rpl_row_sp001 rpl_row_sp011 rpl_row_trig003
rpl_row_trig004 rpl_sp004 rpl_row_inexist_tbl rpl_row_log rpl_row_log_innodb
rpl_ndb_basic rpl_ndb_relayrotate rpl_truncate_7ndb rpl_rbr_to_sbr
rpl_extraColmaster_myisam rpl_extraColmaster_innodb
rpl_row_extraColMaster_ndb 

Chuck

> -----Original Message-----
> From: Chuck Bell [mailto:cbell@stripped] 
> Sent: Thursday, July 26, 2007 3:57 PM
> To: 'Mats Kindahl'
> Cc: commits@stripped
> Subject: RE: bk commit into 5.1 tree (cbell:1.2564)
> 
> Hi Mats,
> 
> Here is the link to my latest patch:
> 
> http://lists.mysql.com/commits/31644
> 
> Below are my responses to your comments and suggestions. 
> Those not listed or not commented upon were implemented as requested. 
> 
> > > +const char *Field_new_decimal::unpack(char* to, 
> > > +                                      const char *from, 
> > > +                                      uint from_length) {
> > > +  uint from_precision= (from_length & 0xff00) >> 8U;
> > > +  uint from_decimal= from_length & 0x00ff;
> > > +  uint length=pack_length();
> > > +  uint from_pack_len=
> > my_decimal_get_binary_size(from_precision, from_decimal);
> > > +  uint len= (from_length && (from_pack_len < length)) ?
> > > +            from_pack_len : length;
> > >   
> > 
> > Can from_length be 0 for a DECIMAL? I don't think it can (after 
> > checking, I'm sure it can't be), so the check for 
> "from_length" in the 
> > conjunction above is unnecessary (but an assertion does not hurt 
> > here).
> 
> Yes, it can if we are replicating with an older version 
> server. In that case, the from_length (hence renamed to 
> param_data) can be 0.
> 
> > 
> > > +  if (from_pack_len && (from_pack_len < length))
> > >   
> > 
> > Can from_pack_len be zero (I'm very sure it can't, check inside 
> > decimal_bin_size())? Otherwise, the check is unnecessary.
> 
> See above.
> 
> > 
> > > +    decimal2double(&dec, &dbl);
> > > +    double2decimal(dbl, &dec);
> > >   
> > 
> > These two statements are very slow: the latter one is 
> converting the 
> > double to a string before actually converting it to a 
> decimal. I don't 
> > know if it is possible to convert them by working with the 
> decimal_t 
> > buffer (it seems feasible), but that would definitely speed 
> things up.
> 
> They have been eliminated. :)
> 
> > > +const char *Field_string::unpack(char *to, const char
> > *from, uint from_length)
> > > +{
> > > +  uint from_len= (from_length & 0xff00) >> 8U;    // real_type.
> > > +  from_len= from_length & 0x00ff;                 // length.
> > >   
> > 
> > Here you overwrite the value of from_len with the lower half of
> > from_length: is this really intentional? Note: from_len is 
> now limited 
> > to the range [0,255], but both CHAR and VARCHAR can have a 
> parameter 
> > in the range [0,2^16). I also suggest adding const qualifiers to 
> > values that you don't intend to change: it is easier to catch this 
> > kind of mistakes that way; i.e.:
> > 
> > uint const from_len= ...;
> 
> I do not need the upper part of param_data here. It stores 
> the real_type() which is only used by the WL#3915 patch. Note 
> that the documentation states CHAR fields must be <= 255 
> bytes. The server throws an error if you attempt to allocate 
> more than 255 bytes to a CHAR field.
> 
> > 
> > > +  uint length= 0;
> > > +  uint f_length= (from_length && (from_len < field_length)) ? 
> > > +                from_len : field_length;  if (f_length > 255)
> > >   
> > 
> > Umh... when field_length is > 255, it can never be the case that 
> > from_len < field_length is false since from_len is bounded to the 
> > range [0,255]. This means that if the field length on the master is 
> > larger than 255 and the field length on the slave is the same as on 
> > the master, the expression from_len < field_length is false even 
> > though the field_length on the master and slave are identical.
> 
> See refactored code.
> 
> > 
> > > +  {
> > > +    length= uint2korr(from);
> > > +    from+= 2;
> > > +  }
> > > +  else
> > > +    length= (uint) *from++;
> > > +  memcpy(to, from, (int) length);
> > > +  bfill(to+length, field_length - length, ' ');
> > >   
> > 
> > Isn't there a fill function for CHAR and VARCHAR fields? 
> > Hardcoding it
> > to space introduce the possibility of duplicating code, 
> which in turn 
> > can introduce subtle bugs. Indeed, in the Field_string::store() 
> > function, there is the following code to actually perform 
> the padding:
> > 
> >   if (copy_length < field_length)
> >     field_charset->cset->fill(field_charset,(char*) ptr+copy_length,
> >                               field_length-copy_length,
> >                               field_charset->pad_char);
> > 
> > I actually think that you can use the Field_string::store() 
> function 
> > instead of rolling your own version as you've done above.
> 
> Store is now used where possible.
> 
> > 
> > > +/*
> > >   
> > 
> > I think you can use Field_varstring::store() here as well, 
> similar to 
> > above.
> 
> See above.
> 
> > > +  for blob fields.
> > > +*/
> > > +const char *Field_blob::unpack(char *to, const char *from,
> > uint from_length)
> > > +{
> > > +  return unpack(to, from);
> > > +}
> > >   
> > 
> > Are you storing the from_length (or param_data, or whatever it is
> > called) for BLOB fields or are you optimizing it away? My personal 
> > opinion is that you should keep simple (i.e., linear) relations 
> > between positions in the fields since that allow the compiler to 
> > generate more efficient code, e.g., something like "param_data = 
> > param_array[column_pos]". See below for an example.
> 
> I am storing but not using it in the unpack method. This is 
> another case of carrying over data for the WL#3915 patch. It 
> uses it to "skip" extra blob fields in the master, but it 
> needs to know what type of blob it is.
> 
> > Can n be zero in a BIT(n) field? According to the manual, 
> it can't. No 
> > harm in replacing this if with an assertion though.
> 
> Ok. Interestingly, you can do this:
> 
> CREATE TABLE y (BIT(0));
> 
> ...and it creates a bit field of length 1. So assert really 
> isn't needed here.
> 
> > 
> > > +  {
> > > +    /*
> > > +      set_rec_bits is a macro, don't put the 
> post-increment in the
> > > +      argument since that might cause strange side-effects.
> > > +
> > > +      For the choice of the second argument, see the
> > explanation for
> > > +      Field_bit::pack().
> > > +    */
> > > +    set_rec_bits(*from, bit_ptr + (to - ptr), bit_ofs,
> > from_bit_len);
> > > +    from++;
> > > +  }
> > > +  memcpy(to, from, from_len);
> > > +  return from + from_len;
> > >   
> > 
> > Check if you can use Field_bit::store() here as well. Might 
> not be the 
> > case, but code duplication is a bad thing and should be avoided as 
> > much as possible.
> 
> See code for complete rewrite of this function per our IRC and emails.
> 
> > 
> > >  }
> > >  
> > >  
> > >
> > > --- 1.226/sql/field.h	2007-07-23 17:19:38 -04:00
> > > +++ 1.227/sql/field.h	2007-07-23 17:19:38 -04:00
> > > @@ -342,6 +342,36 @@
> > >      memcpy(to,from,length);
> > >      return to+length;
> > >    }
> > > +  virtual const char *unpack(char* to, const char *from,
> > uint from_length)
> > > +  {
> > > +    uint length=pack_length();
> > > +    int from_type= 0;
> > > +    /*
> > > +      If from length is > 255, it has encoded data in the
> > upper bits. Need
> > > +      to mask it out.
> > > +    */
> > > +    if (from_length > 255)
> > > +    {
> > > +      from_type= (from_length & 0xff00) >> 8U;  // real_type.
> > > +      from_length= from_length & 0x00ff;        // length.
> > > +    }
> > > +    uint len= (from_length && (from_length < length)) ?
> > > +              from_length : length;
> > > +    /*
> > > +      If the length is the same, use old unpack method.
> > > +      If the from_length is 0, use the old unpack method.
> > >   
> > 
> > What does it mean that the from_length is 0?
> 
> This is a case where NDB can produce a row that has a length 
> of 0. I don't know why but it comes in as length 0 but there 
> is data in the row. So I use the old unpack() method and that 
> handles the situation well.
> 
> > 
> > > +      If the real_types are not the same, use the old
> > unpack method.
> > >   
> > 
> > When can this occur? It seems strange that this would be 
> allowed, but 
> > it is possible.
> 
> See above.
> 
> > 
> > > +  /*
> > > +    We need to delay this clear until the table def is no
> > longer needed.
> > > +    The table def is needed in unpack_row().
> > > +  */
> > > +  if (rli->tables_to_lock && get_flags(STMT_END_F))
> > > +    const_cast<RELAY_LOG_INFO*>(rli)->clear_tables_to_lock();
> > >   
> > 
> > Ah! IC. But why do you need to keep the table map events 
> around? Isn't 
> > that what table_def is to be used for? (I can see reasons 
> to keeping 
> > the table maps around, but why are *you* doing it.)
> 
> I needed to keep the table maps around because the 
> clear_tables_to_lock() method deletes the table_def class 
> instances. :( Which I extended to handle the field metadata.
> 
> > 
> > > +
> > >    if (error)
> > >    {                     /* error has occured during the 
> > transaction */
> > >      slave_print_msg(ERROR_LEVEL, rli, thd->net.last_errno, @@ 
> > > -6314,6 +6318,147 @@
> > >  	Table_map_log_event member functions and support functions
> > >  
> > **************************************************************
> > ************/
> > >  
> > > +/**
> > > +  * Calculate field metadata size based on the real_type
> > of the field.
> > > +  *
> > > +  * @returns int Size of field metadata.
> > > +  */
> > >   
> > 
> > As I said before: I don't see a big advantage to keeping 
> > different-sized data for different fields, especially since 
> you only 
> > save one byte for BLOBs (and for IEEE numbers, but they are 
> not that 
> > common in a database), which are usually large. The advantages to 
> > keeping a constant size for each fields improves performance 
> > significantly (provided the code is written correctly).
> 
> Allow me to explain... The goal was to use as few bytes as 
> possible in the table map. Thus, I had to pack things in a 
> little tight being skimpy on bytes on the master side. 
> However, on the slave side I'm in memory now so I spread out 
> and use short ints for the array to make the code easier to 
> work with. Had it not been for wanting to keep the size of 
> the metadata down, I would have used short ints everywhere.
> 
> > I think you should call it field type parameters instead.
> 
> We had a discussion about what to call it earlier and decided 
> on field metadata because field type information was too 
> easily confused with the type array already in the table map. 
> However, I'm not opposed to renaming it.
> 
> 
> > After reading a little closer, I see that 
> > Table_map_log_event::m_field_metadata and 
> table_def::m_field_metadata 
> > are "false friends" in that they have different types. 
> Please change 
> > either the name of one of the fields, or use the same type for both 
> > fields to avoid confusion. As a side note: if you switch to the 
> > suggestion I had above (using virtual functions instead of 
> a switch), 
> > the two fields will have the same type.
> 
> I renamed the methods and changed some of the return types to 
> help make the distinction. I decided to not switch to 
> individual functions in order to keep on schedule for 
> release. Had I more time I think I would have liked to 
> explore that option. It would have made for a bit cleaner 
> looking code in log_event.cc.
> 
> > Since additional columns are always last in the row, why is this 
> > needed?
> > I do agree that we need to store this field type information in the 
> > table map, but the above is not the reason. (What does "columns ... 
> > that do not exist" mean?)
> 
> Removed.
> 
> > 
> > > +  m_num_null_bytes= (m_table->s->fields + 7) / 8;  m_data_size+= 
> > > + m_num_null_bytes;  m_null_bits= new 
> uchar[m_num_null_bytes];  for 
> > > + (uint i = 0; i < m_num_null_bytes; i++)
> > > +    m_null_bits[i]= 0;
> > > +  for (unsigned int i= 0 ; i < m_table->s->fields ; ++i)
> > > +    if (m_table->field[i]->maybe_null())
> > > +      m_null_bits[(i / 8)]+= 1 << (i % 8);
> > >   
> > 
> > There is a MY_BITMAP with assorted functions to manipulate 
> bitmaps. If 
> > you feel inclined, you can use that instead, but it is not 
> necessary 
> > in this case since you are just generating data to save.
> 
> I'll pass since I am doing some very simple things and not 
> setting/resetting bits.
> 
> Ref: rpl_colSize test 
> 
> > For the CHAR and VARCHAR, you need to test all 
> combinations, i.e., for 
> > CHAR/VARCHAR column on master of type CHAR(m) and column on 
> slave of 
> > type CHAR(s)
> > 
> > Master Slave
> > <256   <256   with m < s, m > s, and m == s
> >  >255   <256   with m < s, m > s, and m == s
> > <256   >255   with m < s, m > s, and m == s
> >  >255   >255   with m < s, m > s, and m == s
> > 
> > You also need to test BIT(m) (on master) and BIT(s) (on
> > slave) so that:
> > 
> > m < s, m % 8 != 0, and s % 8 == 0
> > m < s, m % 8 == 0, and s % 8 != 0
> > m < s, m % 8 != 0, and s % 8 != 0
> > m > s, m % 8 != 0, and s % 8 == 0
> > m > s, m % 8 == 0, and s % 8 != 0
> > m > s, m % 8 != 0, and s % 8 != 0
> > 
> > There is no need to bother about bit fields that where both are 
> > divisible by 8, since they don't store any bits among the 
> null bits of 
> > the record.
> 
> I have placed new tests in for char and varchar and bit. 
> Since char fields can only be 255 bytes, the test is simply 
> <256, <256. For varchar and bit fields where the size of the 
> master can be > than the slave, BUG#22086 will implement code 
> to detect and prevent that. 
> 
> > >    case MYSQL_TYPE_TINY_BLOB:
> > >    case MYSQL_TYPE_MEDIUM_BLOB:
> > >    case MYSQL_TYPE_LONG_BLOB:
> > >    case MYSQL_TYPE_BLOB:
> > >    case MYSQL_TYPE_GEOMETRY:
> > > -    length= ~(uint32) 0;                               // NYI
> > > +  {
> > > +    Field_blob *fb= new Field_blob(m_field_metadata[col]);
> > > +    length= fb->get_packed_size(master_data);
> > > +    delete fb;
> > >   
> > 
> > Ugh... to allocate a new object just to get the size seems like a 
> > waste of cycles. To allocate a new object *dynamically* (on the
> > heap) just to
> > get the size is really a waste of cycles. Can't you get the 
> size some 
> > other way?
> 
> Yes, this was a source of much brainstorming by Andrei and I. 
> We finally decided that this was the only way we could handle 
> getting the field metadata. Notice that we added a new 
> constructor to do this that should make for a fairly execution.
> 
> 
> General
> -------
> If you see anything else you didn't spot the first time or 
> have additional questions, let me know and I'll be available 
> to talk through them via Skype or IRC.
> 
> I plan to be in to work on Friday @0900 EDT. I will begin 
> getting ready for the push and hopefully everything is Ok 
> with the second review and I can start the push tomorrow morning.
> 
> For the rest of this afternoon I will finish running the full 
> test suite in order to try and minimize any possible issues 
> during pushbuild for both this patch and the one for WL#3915 
> which relies on this patch. I know it's no guarantee, but it 
> should eliminate most of the DOH!'s. :)
> 
> Chuck
> 
> 
> --
> MySQL Code Commits Mailing List
> For list archives: http://lists.mysql.com/commits
> To unsubscribe:    
> http://lists.mysql.com/commits?unsub=1
> 

Thread
bk commit into 5.1 tree (cbell:1.2564)cbell23 Jul
  • Re: bk commit into 5.1 tree (cbell:1.2564)Mats Kindahl25 Jul
    • RE: bk commit into 5.1 tree (cbell:1.2564)Chuck Bell26 Jul
      • RE: bk commit into 5.1 tree (cbell:1.2564)Chuck Bell27 Jul
      • Re: bk commit into 5.1 tree (cbell:1.2564)Mats Kindahl27 Jul