Mats,
Here are the latest patches. I had to make 2 minor corrections to the
WL#3228 patch (I failed to update the calc_field_size with the changes to
the bit::unpack() method and I made an error on one correction from the
first review). I also changed one line in the WL#3915 patch to accommodate
the renamed method.
WL#3228: http://lists.mysql.com/commits/31667
WL#3915: http://lists.mysql.com/commits/31668
The WL#3228 patch has passed all tests from the test suite. The WL#3915
patch has passed a subset of the test suite (a list of all affected tests as
shown below) I will run the full test suite on the WL#3915 patch tomorrow
while I await your review, but I am confident everything will pass as the
WL#3915 patch does not affect any other tests except the new ones.
Tests modified/affected (includes WL#3915 tests):
./mysql-test-run.pl --mysqld=--binlog-format=row --force rpl_colSize
rpl_row_max_relay_size rpl_skip_error rpl_insert_id
rpl_switch_stm_row_mixed rpl_trigger rpl_user_variables
binlog_row_mix_innodb_myisam rpl000017 rpl_do_grant rpl_ignore_table
rpl_row_001 rpl_row_basic_11bugs rpl_row_create_table rpl_row_flsh_tbls
rpl_slave_status rpl_ndb_dd_advance rpl_ndb_log rpl_ndb_row_001 rpl_row_user
rpl_misc_functions rpl_multi_engine rpl_row_basic_8partition
rpl_row_mysqlbinlog rpl_row_sp001 rpl_row_sp011 rpl_row_trig003
rpl_row_trig004 rpl_sp004 rpl_row_inexist_tbl rpl_row_log rpl_row_log_innodb
rpl_ndb_basic rpl_ndb_relayrotate rpl_truncate_7ndb rpl_rbr_to_sbr
rpl_extraColmaster_myisam rpl_extraColmaster_innodb
rpl_row_extraColMaster_ndb
Chuck
> -----Original Message-----
> From: Chuck Bell [mailto:cbell@stripped]
> Sent: Thursday, July 26, 2007 3:57 PM
> To: 'Mats Kindahl'
> Cc: commits@stripped
> Subject: RE: bk commit into 5.1 tree (cbell:1.2564)
>
> Hi Mats,
>
> Here is the link to my latest patch:
>
> http://lists.mysql.com/commits/31644
>
> Below are my responses to your comments and suggestions.
> Those not listed or not commented upon were implemented as requested.
>
> > > +const char *Field_new_decimal::unpack(char* to,
> > > + const char *from,
> > > + uint from_length) {
> > > + uint from_precision= (from_length & 0xff00) >> 8U;
> > > + uint from_decimal= from_length & 0x00ff;
> > > + uint length=pack_length();
> > > + uint from_pack_len=
> > my_decimal_get_binary_size(from_precision, from_decimal);
> > > + uint len= (from_length && (from_pack_len < length)) ?
> > > + from_pack_len : length;
> > >
> >
> > Can from_length be 0 for a DECIMAL? I don't think it can (after
> > checking, I'm sure it can't be), so the check for
> "from_length" in the
> > conjunction above is unnecessary (but an assertion does not hurt
> > here).
>
> Yes, it can if we are replicating with an older version
> server. In that case, the from_length (hence renamed to
> param_data) can be 0.
>
> >
> > > + if (from_pack_len && (from_pack_len < length))
> > >
> >
> > Can from_pack_len be zero (I'm very sure it can't, check inside
> > decimal_bin_size())? Otherwise, the check is unnecessary.
>
> See above.
>
> >
> > > + decimal2double(&dec, &dbl);
> > > + double2decimal(dbl, &dec);
> > >
> >
> > These two statements are very slow: the latter one is
> converting the
> > double to a string before actually converting it to a
> decimal. I don't
> > know if it is possible to convert them by working with the
> decimal_t
> > buffer (it seems feasible), but that would definitely speed
> things up.
>
> They have been eliminated. :)
>
> > > +const char *Field_string::unpack(char *to, const char
> > *from, uint from_length)
> > > +{
> > > + uint from_len= (from_length & 0xff00) >> 8U; // real_type.
> > > + from_len= from_length & 0x00ff; // length.
> > >
> >
> > Here you overwrite the value of from_len with the lower half of
> > from_length: is this really intentional? Note: from_len is
> now limited
> > to the range [0,255], but both CHAR and VARCHAR can have a
> parameter
> > in the range [0,2^16). I also suggest adding const qualifiers to
> > values that you don't intend to change: it is easier to catch this
> > kind of mistakes that way; i.e.:
> >
> > uint const from_len= ...;
>
> I do not need the upper part of param_data here. It stores
> the real_type() which is only used by the WL#3915 patch. Note
> that the documentation states CHAR fields must be <= 255
> bytes. The server throws an error if you attempt to allocate
> more than 255 bytes to a CHAR field.
>
> >
> > > + uint length= 0;
> > > + uint f_length= (from_length && (from_len < field_length)) ?
> > > + from_len : field_length; if (f_length > 255)
> > >
> >
> > Umh... when field_length is > 255, it can never be the case that
> > from_len < field_length is false since from_len is bounded to the
> > range [0,255]. This means that if the field length on the master is
> > larger than 255 and the field length on the slave is the same as on
> > the master, the expression from_len < field_length is false even
> > though the field_length on the master and slave are identical.
>
> See refactored code.
>
> >
> > > + {
> > > + length= uint2korr(from);
> > > + from+= 2;
> > > + }
> > > + else
> > > + length= (uint) *from++;
> > > + memcpy(to, from, (int) length);
> > > + bfill(to+length, field_length - length, ' ');
> > >
> >
> > Isn't there a fill function for CHAR and VARCHAR fields?
> > Hardcoding it
> > to space introduce the possibility of duplicating code,
> which in turn
> > can introduce subtle bugs. Indeed, in the Field_string::store()
> > function, there is the following code to actually perform
> the padding:
> >
> > if (copy_length < field_length)
> > field_charset->cset->fill(field_charset,(char*) ptr+copy_length,
> > field_length-copy_length,
> > field_charset->pad_char);
> >
> > I actually think that you can use the Field_string::store()
> function
> > instead of rolling your own version as you've done above.
>
> Store is now used where possible.
>
> >
> > > +/*
> > >
> >
> > I think you can use Field_varstring::store() here as well,
> similar to
> > above.
>
> See above.
>
> > > + for blob fields.
> > > +*/
> > > +const char *Field_blob::unpack(char *to, const char *from,
> > uint from_length)
> > > +{
> > > + return unpack(to, from);
> > > +}
> > >
> >
> > Are you storing the from_length (or param_data, or whatever it is
> > called) for BLOB fields or are you optimizing it away? My personal
> > opinion is that you should keep simple (i.e., linear) relations
> > between positions in the fields since that allow the compiler to
> > generate more efficient code, e.g., something like "param_data =
> > param_array[column_pos]". See below for an example.
>
> I am storing but not using it in the unpack method. This is
> another case of carrying over data for the WL#3915 patch. It
> uses it to "skip" extra blob fields in the master, but it
> needs to know what type of blob it is.
>
> > Can n be zero in a BIT(n) field? According to the manual,
> it can't. No
> > harm in replacing this if with an assertion though.
>
> Ok. Interestingly, you can do this:
>
> CREATE TABLE y (BIT(0));
>
> ...and it creates a bit field of length 1. So assert really
> isn't needed here.
>
> >
> > > + {
> > > + /*
> > > + set_rec_bits is a macro, don't put the
> post-increment in the
> > > + argument since that might cause strange side-effects.
> > > +
> > > + For the choice of the second argument, see the
> > explanation for
> > > + Field_bit::pack().
> > > + */
> > > + set_rec_bits(*from, bit_ptr + (to - ptr), bit_ofs,
> > from_bit_len);
> > > + from++;
> > > + }
> > > + memcpy(to, from, from_len);
> > > + return from + from_len;
> > >
> >
> > Check if you can use Field_bit::store() here as well. Might
> not be the
> > case, but code duplication is a bad thing and should be avoided as
> > much as possible.
>
> See code for complete rewrite of this function per our IRC and emails.
>
> >
> > > }
> > >
> > >
> > >
> > > --- 1.226/sql/field.h 2007-07-23 17:19:38 -04:00
> > > +++ 1.227/sql/field.h 2007-07-23 17:19:38 -04:00
> > > @@ -342,6 +342,36 @@
> > > memcpy(to,from,length);
> > > return to+length;
> > > }
> > > + virtual const char *unpack(char* to, const char *from,
> > uint from_length)
> > > + {
> > > + uint length=pack_length();
> > > + int from_type= 0;
> > > + /*
> > > + If from length is > 255, it has encoded data in the
> > upper bits. Need
> > > + to mask it out.
> > > + */
> > > + if (from_length > 255)
> > > + {
> > > + from_type= (from_length & 0xff00) >> 8U; // real_type.
> > > + from_length= from_length & 0x00ff; // length.
> > > + }
> > > + uint len= (from_length && (from_length < length)) ?
> > > + from_length : length;
> > > + /*
> > > + If the length is the same, use old unpack method.
> > > + If the from_length is 0, use the old unpack method.
> > >
> >
> > What does it mean that the from_length is 0?
>
> This is a case where NDB can produce a row that has a length
> of 0. I don't know why but it comes in as length 0 but there
> is data in the row. So I use the old unpack() method and that
> handles the situation well.
>
> >
> > > + If the real_types are not the same, use the old
> > unpack method.
> > >
> >
> > When can this occur? It seems strange that this would be
> allowed, but
> > it is possible.
>
> See above.
>
> >
> > > + /*
> > > + We need to delay this clear until the table def is no
> > longer needed.
> > > + The table def is needed in unpack_row().
> > > + */
> > > + if (rli->tables_to_lock && get_flags(STMT_END_F))
> > > + const_cast<RELAY_LOG_INFO*>(rli)->clear_tables_to_lock();
> > >
> >
> > Ah! IC. But why do you need to keep the table map events
> around? Isn't
> > that what table_def is to be used for? (I can see reasons
> to keeping
> > the table maps around, but why are *you* doing it.)
>
> I needed to keep the table maps around because the
> clear_tables_to_lock() method deletes the table_def class
> instances. :( Which I extended to handle the field metadata.
>
> >
> > > +
> > > if (error)
> > > { /* error has occured during the
> > transaction */
> > > slave_print_msg(ERROR_LEVEL, rli, thd->net.last_errno, @@
> > > -6314,6 +6318,147 @@
> > > Table_map_log_event member functions and support functions
> > >
> > **************************************************************
> > ************/
> > >
> > > +/**
> > > + * Calculate field metadata size based on the real_type
> > of the field.
> > > + *
> > > + * @returns int Size of field metadata.
> > > + */
> > >
> >
> > As I said before: I don't see a big advantage to keeping
> > different-sized data for different fields, especially since
> you only
> > save one byte for BLOBs (and for IEEE numbers, but they are
> not that
> > common in a database), which are usually large. The advantages to
> > keeping a constant size for each fields improves performance
> > significantly (provided the code is written correctly).
>
> Allow me to explain... The goal was to use as few bytes as
> possible in the table map. Thus, I had to pack things in a
> little tight being skimpy on bytes on the master side.
> However, on the slave side I'm in memory now so I spread out
> and use short ints for the array to make the code easier to
> work with. Had it not been for wanting to keep the size of
> the metadata down, I would have used short ints everywhere.
>
> > I think you should call it field type parameters instead.
>
> We had a discussion about what to call it earlier and decided
> on field metadata because field type information was too
> easily confused with the type array already in the table map.
> However, I'm not opposed to renaming it.
>
>
> > After reading a little closer, I see that
> > Table_map_log_event::m_field_metadata and
> table_def::m_field_metadata
> > are "false friends" in that they have different types.
> Please change
> > either the name of one of the fields, or use the same type for both
> > fields to avoid confusion. As a side note: if you switch to the
> > suggestion I had above (using virtual functions instead of
> a switch),
> > the two fields will have the same type.
>
> I renamed the methods and changed some of the return types to
> help make the distinction. I decided to not switch to
> individual functions in order to keep on schedule for
> release. Had I more time I think I would have liked to
> explore that option. It would have made for a bit cleaner
> looking code in log_event.cc.
>
> > Since additional columns are always last in the row, why is this
> > needed?
> > I do agree that we need to store this field type information in the
> > table map, but the above is not the reason. (What does "columns ...
> > that do not exist" mean?)
>
> Removed.
>
> >
> > > + m_num_null_bytes= (m_table->s->fields + 7) / 8; m_data_size+=
> > > + m_num_null_bytes; m_null_bits= new
> uchar[m_num_null_bytes]; for
> > > + (uint i = 0; i < m_num_null_bytes; i++)
> > > + m_null_bits[i]= 0;
> > > + for (unsigned int i= 0 ; i < m_table->s->fields ; ++i)
> > > + if (m_table->field[i]->maybe_null())
> > > + m_null_bits[(i / 8)]+= 1 << (i % 8);
> > >
> >
> > There is a MY_BITMAP with assorted functions to manipulate
> bitmaps. If
> > you feel inclined, you can use that instead, but it is not
> necessary
> > in this case since you are just generating data to save.
>
> I'll pass since I am doing some very simple things and not
> setting/resetting bits.
>
> Ref: rpl_colSize test
>
> > For the CHAR and VARCHAR, you need to test all
> combinations, i.e., for
> > CHAR/VARCHAR column on master of type CHAR(m) and column on
> slave of
> > type CHAR(s)
> >
> > Master Slave
> > <256 <256 with m < s, m > s, and m == s
> > >255 <256 with m < s, m > s, and m == s
> > <256 >255 with m < s, m > s, and m == s
> > >255 >255 with m < s, m > s, and m == s
> >
> > You also need to test BIT(m) (on master) and BIT(s) (on
> > slave) so that:
> >
> > m < s, m % 8 != 0, and s % 8 == 0
> > m < s, m % 8 == 0, and s % 8 != 0
> > m < s, m % 8 != 0, and s % 8 != 0
> > m > s, m % 8 != 0, and s % 8 == 0
> > m > s, m % 8 == 0, and s % 8 != 0
> > m > s, m % 8 != 0, and s % 8 != 0
> >
> > There is no need to bother about bit fields that where both are
> > divisible by 8, since they don't store any bits among the
> null bits of
> > the record.
>
> I have placed new tests in for char and varchar and bit.
> Since char fields can only be 255 bytes, the test is simply
> <256, <256. For varchar and bit fields where the size of the
> master can be > than the slave, BUG#22086 will implement code
> to detect and prevent that.
>
> > > case MYSQL_TYPE_TINY_BLOB:
> > > case MYSQL_TYPE_MEDIUM_BLOB:
> > > case MYSQL_TYPE_LONG_BLOB:
> > > case MYSQL_TYPE_BLOB:
> > > case MYSQL_TYPE_GEOMETRY:
> > > - length= ~(uint32) 0; // NYI
> > > + {
> > > + Field_blob *fb= new Field_blob(m_field_metadata[col]);
> > > + length= fb->get_packed_size(master_data);
> > > + delete fb;
> > >
> >
> > Ugh... to allocate a new object just to get the size seems like a
> > waste of cycles. To allocate a new object *dynamically* (on the
> > heap) just to
> > get the size is really a waste of cycles. Can't you get the
> size some
> > other way?
>
> Yes, this was a source of much brainstorming by Andrei and I.
> We finally decided that this was the only way we could handle
> getting the field metadata. Notice that we added a new
> constructor to do this that should make for a fairly execution.
>
>
> General
> -------
> If you see anything else you didn't spot the first time or
> have additional questions, let me know and I'll be available
> to talk through them via Skype or IRC.
>
> I plan to be in to work on Friday @0900 EDT. I will begin
> getting ready for the push and hopefully everything is Ok
> with the second review and I can start the push tomorrow morning.
>
> For the rest of this afternoon I will finish running the full
> test suite in order to try and minimize any possible issues
> during pushbuild for both this patch and the one for WL#3915
> which relies on this patch. I know it's no guarantee, but it
> should eliminate most of the DOH!'s. :)
>
> Chuck
>
>
> --
> MySQL Code Commits Mailing List
> For list archives: http://lists.mysql.com/commits
> To unsubscribe:
> http://lists.mysql.com/commits?unsub=1
>