List:Internals« Previous MessageNext Message »
From:Mats Kindahl Date:May 13 2011 9:43am
Subject:Re: WL#2540, binlog checksums, interoperability between different
versions
View as plain text  
On 05/09/2011 12:50 PM, Kristian Nielsen wrote:
> Mats Kindahl <mats.kindahl@stripped> writes:
>
>> Hmm... not sure what code you're looking at, but I'm looking at
>> rpl_slave.cc line 1696 (or so) and AFAICT, there is no version
>> checking there. There are some version checking to give errors for
>> some specific bugs later in the file (line 5393 or so), but that does
>> not have anything to do with the checksum implementation.
> I am looking at latest lp:mysql-server (as of today),
> Format_description_log_event() constructor. I checked that the code is the
> same in the MySQL 5.6.2-m5 source tarball on dev.mysql.com/downloads. (This is
> just MySQL code, no MariaDB):
>
>   calc_server_version_split();
>   if ((ver_calc= get_version_product()) >= checksum_version_product)
>   {
>     /* the last bytes are the checksum alg desc and value (or value's room) */
>     number_of_event_types -= BINLOG_CHECKSUM_ALG_DESC_LEN;
>     /*
>       FD from the checksum-home version server (ver_calc ==
>       checksum_version_product) must have 
>       number_of_event_types == LOG_EVENT_TYPES.
>     */
>     DBUG_ASSERT(ver_calc != checksum_version_product ||
>                 number_of_event_types == LOG_EVENT_TYPES);
>     checksum_alg= post_header_len[number_of_event_types];
>   }
>   else
>   {
>     checksum_alg= (uint8) BINLOG_CHECKSUM_ALG_UNDEF;
>   }
>
> This code will set checksum_alg to BINLOG_CHECKSUM_ALG_UNDEF if the server
> version in the received format description event is < 5.6.1.
>
> And then later in read_log_event():
>
>     if (alg != BINLOG_CHECKSUM_ALG_UNDEF &&
>         (event_type == FORMAT_DESCRIPTION_EVENT ||
>          alg != BINLOG_CHECKSUM_ALG_OFF))
>       event_len= event_len - BINLOG_CHECKSUM_LEN;
>
> The result is that if the master sends a format_description_event with server
> version string < 5.6.1 followed by an event with checksum at the end, the
> slave will incorrectly interpret the 4 bytes of checksum as part of the event
> payload (the above subtraction from event_len will not be run).
>
> Do you see the problem? The slave requests to receive checksums, the master
> sends them back, but then the slave looks at the format_description event, and
> decides that events are without checksums. The result is that eg. checksums
> are interpreted as extra garbage bytes at end of query strings etc.

Oh, so right you are... :)

>> in general, tying any specific property of the server to a version
>> number is likely to cause problems. One you have already discovered
>> (you would need to change the Oracle MySQL server), but there are some
>> other issues as well. In general, it goes something like this:
>>
>>     * The server has some set of properties taken from P = { A, B, C,
>>       ... }
>>     * The server has a version
>>
>> Now, if you use the version number V to figure out the set of
>> properties {A, B, C}, you do this in an indirect manner based on the
>> assumption that the mapping 'properties :: V --> 2^P' is static, but
>> in general it cannot be. Consider these cases:
>>
>>     * Some property is in a plugin and loaded dynamically. In that
>>       case, you cannot look at the server version to decide if it has
>>       property A.
>>     * Server version numbering scheme changes (Just an example is
>>       where the server is forked. Even inside Oracle, there are  two
>>       lines of development of the server with NDB on one branch and
>>       the main server on another.) In this case, you need to do
>>       something like what you are outlining above to be able to deduce
>>       what property a server supports based on the version number.
>>     * Server version numbering is not monotone, that is, some
>>       properties are removed from later servers. This would require
>>       you to hard-code an elaborate scheme for deciding if a server
>>       has a certain property based on the version of the server.
>>
>> Having a direct scheme, asking the question "do you have property A"
>> is superior to asking "what version is you" and then deducing using
>> private notes that the server should have property A.
> Yes, I fully agree :-) And good point about NDB, I would expect there will be
> problems replicating with checksums enabled just between mysql and telco when
> wl#2540 is merged into -telco.
>
> Do you see now from above explanation where the WL#2540 in current MySQL
> lp:mysql-server source is using version to detect capability?
>
> In this case the problem is slightly complicated, as we do not only need the
> capabilities of the server sending the event, we need the capabilities of the
> server that wrote the event. Eg. if MySQL 5.6.2 is sending old events written
> by MySQL 5.5.8 before upgrading. So it is not enough for slave to query
> capabilities of the running server, it needs also to check capabilities of the
> server that wrote the events.
>
> As a suggestion on how to improve this, how about adding a new event flag?
>
>     #define LOG_EVENT_CHECKSUMMED_F 0x200
>
> This flag will be set on events that have checksums. Then there is no need to
> look at previous format_description_event to guess if current event has
> checksum embedded or not, just look at the flag. And a lot of scary logic
> related to relay logs where some events are checksummed and some are not can
> be eliminated.
>
> What do you think?

Yes, we need some sort of way of detecting the checksum algorithm
(with "NONE" being one) but looking at the flag for all events does
not work since it can be corrupted just like everything else.

    * If the flag says "has checksum" while it really doesn't have it,
      we're fine since the checksum algorithm will detect this.
    * If the flag says "does not have checksum" while it has it, we're
      screwed. Not checking the checksum means that we go out of sync
      w.r.t. the event borders (there is a checksum at the end of the
      body, which will interfere with the reading of the event).

The problem is that we have to know whether we can run the checksum
algorithm or not. If we know that we always can run the checksum
algorithm, we will know if the file has checksums or not.

Also note that in the handshake protocol that I mentioned in the
previous mail, you already know if the server support checksums: this
part is about if the event was written with a checksum or not "once
upon a time". So, if we have a new server, we can have it send
something that is always checksummed and which indicate whether the
*binary log* has a checksum or not.

The best way to handle this is to offer a a guarantee that we will
have a checksum on, e.g., the first event. However, we cannot use the
Format description log event since adding a checksum where there is
none can upset the slave since it alters the offsets into the binary
log. However, the first event sent is a fake rotate event used only to
indicate which binary log is the "first" binary log. In a similar way,
we can actually add a checksum to the Rotate event at the end of a
binary log file since there will be nothing after it. In other words,
if we always checksum the Rotate events and add a flag there telling
of the *next* binary log has a checksum, we're home free.

    * We can distinguish new and old servers.
    * When we receive a Rotate event from a new server, we compute the
      checksum to see if it is corrupt.
          o If it is corrupt, we just stop the slave.
    * If it is not corrupt, we can check the flag and see if the next
      binary log will be checksummed or not.

This means that we will always know if the event has a checksum or not.

Just my few cents,
Mats Kindahl
Thread
WL#2540, binlog checksums, interoperability between different versionsKristian Nielsen9 May
  • Re: WL#2540, binlog checksums, interoperability between differentversionsMats Kindahl9 May
    • Re: WL#2540, binlog checksums, interoperability between different versionsKristian Nielsen9 May
      • Re: WL#2540, binlog checksums, interoperability between differentversionsMats Kindahl13 May
  • Re: WL#2540, binlog checksums, interoperability between different versionsAndrei Elkin11 May