Hi Mats,
Here is the link to my latest patch:
http://lists.mysql.com/commits/31644
Below are my responses to your comments and suggestions. Those not listed or
not commented upon were implemented as requested.
> > +const char *Field_new_decimal::unpack(char* to,
> > + const char *from,
> > + uint from_length)
> > +{
> > + uint from_precision= (from_length & 0xff00) >> 8U;
> > + uint from_decimal= from_length & 0x00ff;
> > + uint length=pack_length();
> > + uint from_pack_len=
> my_decimal_get_binary_size(from_precision, from_decimal);
> > + uint len= (from_length && (from_pack_len < length)) ?
> > + from_pack_len : length;
> >
>
> Can from_length be 0 for a DECIMAL? I don't think it can (after
> checking, I'm sure it can't be), so the check for
> "from_length" in the
> conjunction above is unnecessary (but an assertion does not
> hurt here).
Yes, it can if we are replicating with an older version server. In that
case, the from_length (hence renamed to param_data) can be 0.
>
> > + if (from_pack_len && (from_pack_len < length))
> >
>
> Can from_pack_len be zero (I'm very sure it can't, check inside
> decimal_bin_size())? Otherwise, the check is unnecessary.
See above.
>
> > + decimal2double(&dec, &dbl);
> > + double2decimal(dbl, &dec);
> >
>
> These two statements are very slow: the latter one is converting the
> double to a string before actually converting it to a
> decimal. I don't
> know if it is possible to convert them by working with the decimal_t
> buffer (it seems feasible), but that would definitely speed things up.
They have been eliminated. :)
> > +const char *Field_string::unpack(char *to, const char
> *from, uint from_length)
> > +{
> > + uint from_len= (from_length & 0xff00) >> 8U; // real_type.
> > + from_len= from_length & 0x00ff; // length.
> >
>
> Here you overwrite the value of from_len with the lower half of
> from_length: is this really intentional? Note: from_len is
> now limited
> to the range [0,255], but both CHAR and VARCHAR can have a
> parameter in
> the range [0,2^16). I also suggest adding const qualifiers to values
> that you don't intend to change: it is easier to catch this kind of
> mistakes that way; i.e.:
>
> uint const from_len= ...;
I do not need the upper part of param_data here. It stores the real_type()
which is only used by the WL#3915 patch. Note that the documentation states
CHAR fields must be <= 255 bytes. The server throws an error if you attempt
to allocate more than 255 bytes to a CHAR field.
>
> > + uint length= 0;
> > + uint f_length= (from_length && (from_len < field_length)) ?
> > + from_len : field_length;
> > + if (f_length > 255)
> >
>
> Umh... when field_length is > 255, it can never be the case that
> from_len < field_length is false since from_len is bounded to
> the range
> [0,255]. This means that if the field length on the master is larger
> than 255 and the field length on the slave is the same as on
> the master,
> the expression from_len < field_length is false even though the
> field_length on the master and slave are identical.
See refactored code.
>
> > + {
> > + length= uint2korr(from);
> > + from+= 2;
> > + }
> > + else
> > + length= (uint) *from++;
> > + memcpy(to, from, (int) length);
> > + bfill(to+length, field_length - length, ' ');
> >
>
> Isn't there a fill function for CHAR and VARCHAR fields?
> Hardcoding it
> to space introduce the possibility of duplicating code, which in turn
> can introduce subtle bugs. Indeed, in the Field_string::store()
> function, there is the following code to actually perform the padding:
>
> if (copy_length < field_length)
> field_charset->cset->fill(field_charset,(char*) ptr+copy_length,
> field_length-copy_length,
> field_charset->pad_char);
>
> I actually think that you can use the Field_string::store() function
> instead of rolling your own version as you've done above.
Store is now used where possible.
>
> > +/*
> >
>
> I think you can use Field_varstring::store() here as well,
> similar to above.
See above.
> > + for blob fields.
> > +*/
> > +const char *Field_blob::unpack(char *to, const char *from,
> uint from_length)
> > +{
> > + return unpack(to, from);
> > +}
> >
>
> Are you storing the from_length (or param_data, or whatever it is
> called) for BLOB fields or are you optimizing it away? My personal
> opinion is that you should keep simple (i.e., linear)
> relations between
> positions in the fields since that allow the compiler to
> generate more
> efficient code, e.g., something like "param_data =
> param_array[column_pos]". See below for an example.
I am storing but not using it in the unpack method. This is another case of
carrying over data for the WL#3915 patch. It uses it to "skip" extra blob
fields in the master, but it needs to know what type of blob it is.
> Can n be zero in a BIT(n) field? According to the manual, it
> can't. No
> harm in replacing this if with an assertion though.
Ok. Interestingly, you can do this:
CREATE TABLE y (BIT(0));
...and it creates a bit field of length 1. So assert really isn't needed
here.
>
> > + {
> > + /*
> > + set_rec_bits is a macro, don't put the post-increment in the
> > + argument since that might cause strange side-effects.
> > +
> > + For the choice of the second argument, see the
> explanation for
> > + Field_bit::pack().
> > + */
> > + set_rec_bits(*from, bit_ptr + (to - ptr), bit_ofs,
> from_bit_len);
> > + from++;
> > + }
> > + memcpy(to, from, from_len);
> > + return from + from_len;
> >
>
> Check if you can use Field_bit::store() here as well. Might
> not be the
> case, but code duplication is a bad thing and should be
> avoided as much
> as possible.
See code for complete rewrite of this function per our IRC and emails.
>
> > }
> >
> >
> >
> > --- 1.226/sql/field.h 2007-07-23 17:19:38 -04:00
> > +++ 1.227/sql/field.h 2007-07-23 17:19:38 -04:00
> > @@ -342,6 +342,36 @@
> > memcpy(to,from,length);
> > return to+length;
> > }
> > + virtual const char *unpack(char* to, const char *from,
> uint from_length)
> > + {
> > + uint length=pack_length();
> > + int from_type= 0;
> > + /*
> > + If from length is > 255, it has encoded data in the
> upper bits. Need
> > + to mask it out.
> > + */
> > + if (from_length > 255)
> > + {
> > + from_type= (from_length & 0xff00) >> 8U; // real_type.
> > + from_length= from_length & 0x00ff; // length.
> > + }
> > + uint len= (from_length && (from_length < length)) ?
> > + from_length : length;
> > + /*
> > + If the length is the same, use old unpack method.
> > + If the from_length is 0, use the old unpack method.
> >
>
> What does it mean that the from_length is 0?
This is a case where NDB can produce a row that has a length of 0. I don't
know why but it comes in as length 0 but there is data in the row. So I use
the old unpack() method and that handles the situation well.
>
> > + If the real_types are not the same, use the old
> unpack method.
> >
>
> When can this occur? It seems strange that this would be
> allowed, but it
> is possible.
See above.
>
> > + /*
> > + We need to delay this clear until the table def is no
> longer needed.
> > + The table def is needed in unpack_row().
> > + */
> > + if (rli->tables_to_lock && get_flags(STMT_END_F))
> > + const_cast<RELAY_LOG_INFO*>(rli)->clear_tables_to_lock();
> >
>
> Ah! IC. But why do you need to keep the table map events
> around? Isn't
> that what table_def is to be used for? (I can see reasons to
> keeping the
> table maps around, but why are *you* doing it.)
I needed to keep the table maps around because the clear_tables_to_lock()
method deletes the table_def class instances. :( Which I extended to handle
the field metadata.
>
> > +
> > if (error)
> > { /* error has occured during the
> transaction */
> > slave_print_msg(ERROR_LEVEL, rli, thd->net.last_errno,
> > @@ -6314,6 +6318,147 @@
> > Table_map_log_event member functions and support functions
> >
> **************************************************************
> ************/
> >
> > +/**
> > + * Calculate field metadata size based on the real_type
> of the field.
> > + *
> > + * @returns int Size of field metadata.
> > + */
> >
>
> As I said before: I don't see a big advantage to keeping
> different-sized
> data for different fields, especially since you only save one
> byte for
> BLOBs (and for IEEE numbers, but they are not that common in a
> database), which are usually large. The advantages to keeping
> a constant
> size for each fields improves performance significantly (provided the
> code is written correctly).
Allow me to explain... The goal was to use as few bytes as possible in the
table map. Thus, I had to pack things in a little tight being skimpy on
bytes on the master side. However, on the slave side I'm in memory now so I
spread out and use short ints for the array to make the code easier to work
with. Had it not been for wanting to keep the size of the metadata down, I
would have used short ints everywhere.
> I think you should call it field type parameters instead.
We had a discussion about what to call it earlier and decided on field
metadata because field type information was too easily confused with the
type array already in the table map. However, I'm not opposed to renaming
it.
> After reading a little closer, I see that
> Table_map_log_event::m_field_metadata and table_def::m_field_metadata
> are "false friends" in that they have different types. Please change
> either the name of one of the fields, or use the same type for both
> fields to avoid confusion. As a side note: if you switch to the
> suggestion I had above (using virtual functions instead of a switch),
> the two fields will have the same type.
I renamed the methods and changed some of the return types to help make the
distinction. I decided to not switch to individual functions in order to
keep on schedule for release. Had I more time I think I would have liked to
explore that option. It would have made for a bit cleaner looking code in
log_event.cc.
> Since additional columns are always last in the row, why is
> this needed?
> I do agree that we need to store this field type information in the
> table map, but the above is not the reason. (What does
> "columns ... that
> do not exist" mean?)
Removed.
>
> > + m_num_null_bytes= (m_table->s->fields + 7) / 8;
> > + m_data_size+= m_num_null_bytes;
> > + m_null_bits= new uchar[m_num_null_bytes];
> > + for (uint i = 0; i < m_num_null_bytes; i++)
> > + m_null_bits[i]= 0;
> > + for (unsigned int i= 0 ; i < m_table->s->fields ; ++i)
> > + if (m_table->field[i]->maybe_null())
> > + m_null_bits[(i / 8)]+= 1 << (i % 8);
> >
>
> There is a MY_BITMAP with assorted functions to manipulate
> bitmaps. If
> you feel inclined, you can use that instead, but it is not
> necessary in
> this case since you are just generating data to save.
I'll pass since I am doing some very simple things and not setting/resetting
bits.
Ref: rpl_colSize test
> For the CHAR and VARCHAR, you need to test all combinations,
> i.e., for
> CHAR/VARCHAR column on master of type CHAR(m) and column on slave of
> type CHAR(s)
>
> Master Slave
> <256 <256 with m < s, m > s, and m == s
> >255 <256 with m < s, m > s, and m == s
> <256 >255 with m < s, m > s, and m == s
> >255 >255 with m < s, m > s, and m == s
>
> You also need to test BIT(m) (on master) and BIT(s) (on
> slave) so that:
>
> m < s, m % 8 != 0, and s % 8 == 0
> m < s, m % 8 == 0, and s % 8 != 0
> m < s, m % 8 != 0, and s % 8 != 0
> m > s, m % 8 != 0, and s % 8 == 0
> m > s, m % 8 == 0, and s % 8 != 0
> m > s, m % 8 != 0, and s % 8 != 0
>
> There is no need to bother about bit fields that where both are
> divisible by 8, since they don't store any bits among the
> null bits of
> the record.
I have placed new tests in for char and varchar and bit. Since char fields
can only be 255 bytes, the test is simply <256, <256. For varchar and bit
fields where the size of the master can be > than the slave, BUG#22086 will
implement code to detect and prevent that.
> > case MYSQL_TYPE_TINY_BLOB:
> > case MYSQL_TYPE_MEDIUM_BLOB:
> > case MYSQL_TYPE_LONG_BLOB:
> > case MYSQL_TYPE_BLOB:
> > case MYSQL_TYPE_GEOMETRY:
> > - length= ~(uint32) 0; // NYI
> > + {
> > + Field_blob *fb= new Field_blob(m_field_metadata[col]);
> > + length= fb->get_packed_size(master_data);
> > + delete fb;
> >
>
> Ugh... to allocate a new object just to get the size seems
> like a waste
> of cycles. To allocate a new object *dynamically* (on the
> heap) just to
> get the size is really a waste of cycles. Can't you get the size some
> other way?
Yes, this was a source of much brainstorming by Andrei and I. We finally
decided that this was the only way we could handle getting the field
metadata. Notice that we added a new constructor to do this that should make
for a fairly execution.
General
-------
If you see anything else you didn't spot the first time or have additional
questions, let me know and I'll be available to talk through them via Skype
or IRC.
I plan to be in to work on Friday @0900 EDT. I will begin getting ready for
the push and hopefully everything is Ok with the second review and I can
start the push tomorrow morning.
For the rest of this afternoon I will finish running the full test suite in
order to try and minimize any possible issues during pushbuild for both this
patch and the one for WL#3915 which relies on this patch. I know it's no
guarantee, but it should eliminate most of the DOH!'s. :)
Chuck