Hi Andrei,
Thanks for working on this. I think your analysis is correct and your
patch will fix the bug. As discussed on phone, I think it would be
better if the slave's IO thread did not process the binlog before
writing it to the relay log, but your solution is ok for the current bug.
I have three comments on your patch:
(1) I don't think we should use asserts to check input. See comments
inline.
(2) Some comments can be improved. See comments inline.
(3) If possible, I think we should try to create a test case, as
discussed on phone.
Merry Christmas,
/Sven
Andrei Elkin wrote:
> #At
> file:///home/andrei/MySQL/BZR/FIXES/5.0-bt-bug47142-until_catches_local_rotate-andrii/
> based on revid:joro@stripped
>
> 2845 Andrei Elkin 2009-11-13
> Bug #47142 "slave start until" stops 1 event too late in 4.1 to 5.0
> replication
>
> When replicating from 4.1 master to 5.0 slave START SLAVE UNTIL can stop too
> late.
> The necessary in calculating of the beginning of an event the event's length
> did not correspond to the master's genuine information at the event's execution
> time.
> That piece of info was changed at the event's relay-logging.
>
> Fixed with storing the master genuine Query_log_event size into a new status
> variable at relay-logging of the event. The stored info is extacted at the
> event
> execution and participate further in the until-pos stopping routine.
> The new status variable's algorithm will be only active when the event comes
> from the master of version less than 5.0.
>
> @ sql/log_event.cc
> Storing and extracting the master's genuine size of the event from the
> status
> var of the event packet header.
> @ sql/log_event.h
> Incrementing the max szie of MAX_SIZE_LOG_EVENT_STATUS because of the new
> status var;
> Defining the new status variable to hold the master's genuine event size;
> Augmending the Query_log_event with a new member to hold a value to
> store/extact from the status
> var of the event packet header.
>
> modified:
> sql/log_event.cc
> sql/log_event.h
> === modified file 'sql/log_event.cc'
> --- a/sql/log_event.cc 2009-08-12 03:54:05 +0000
> +++ b/sql/log_event.cc 2009-11-13 19:20:40 +0000
> @@ -1271,10 +1271,20 @@ bool Query_log_event::write(IO_CACHE* fi
> int8store(start, table_map_for_update);
> start+= 8;
> }
> + if (master_data_written != 0)
> + {
> + /*
> + the event is originated by an old master
I would suggest to clarify the comment slightly:
/*
Q_MASTER_DATA_WRITTEN_CODE only exists in relay logs where the master
has binlog_version<4 and the slave has binlog_version=4. See comment
for master_data_written in log_event.h for details.
*/
> + */
> + *start++= Q_MASTER_DATA_WRITTEN_CODE;
> + int4store(start, master_data_written);
> + start+= 4;
> + }
> +
> /*
> NOTE: When adding new status vars, please don't forget to update
> - the MAX_SIZE_LOG_EVENT_STATUS in log_event.h and update function
> - code_name in this file.
> + the MAX_SIZE_LOG_EVENT_STATUS in log_event.h and update the function
> + code_name() in this file.
>
> Here there could be code like
> if (command-line-option-which-says-"log_this_variable" && inited)
> @@ -1354,6 +1364,7 @@ Query_log_event::Query_log_event(THD* th
> auto_increment_offset(thd_arg->variables.auto_increment_offset),
> lc_time_names_number(thd_arg->variables.lc_time_names->number),
> charset_database_number(0),
> + master_data_written(0),
> table_map_for_update((ulonglong)thd_arg->table_map_for_update)
> {
> time_t end_time;
> @@ -1481,6 +1492,7 @@ code_name(int code)
> case Q_LC_TIME_NAMES_CODE: return "Q_LC_TIME_NAMES_CODE";
> case Q_CHARSET_DATABASE_CODE: return "Q_CHARSET_DATABASE_CODE";
> case Q_TABLE_MAP_FOR_UPDATE_CODE: return "Q_TABLE_MAP_FOR_UPDATE_CODE";
> + case Q_MASTER_DATA_WRITTEN_CODE: return "Q_MASTER_DATA_WRITTEN_CODE";
> }
> sprintf(buf, "CODE#%d", code);
> return buf;
> @@ -1518,7 +1530,7 @@ Query_log_event::Query_log_event(const c
> flags2_inited(0), sql_mode_inited(0), charset_inited(0),
> auto_increment_increment(1), auto_increment_offset(1),
> time_zone_len(0), lc_time_names_number(0), charset_database_number(0),
> - table_map_for_update(0)
> + table_map_for_update(0), master_data_written(0)
> {
> ulong data_len;
> uint32 tmp;
> @@ -1574,6 +1586,17 @@ Query_log_event::Query_log_event(const c
> DBUG_PRINT("info", ("Query_log_event has status_vars_len: %u",
> (uint) status_vars_len));
> tmp-= 2;
> + }
> + else
> + {
> + /*
> + old (of server version < 5.0) master's event will be relay-logged
> + to skew the original log_pos.
> + log_pos is saved in a status variable to restore at reading the event
> + from the relay log.
I think this comment is not correct: it is not the original log_pos
(=offset from the beginning of the master binlog file to the beginning
of the event) that is stored in status_vars. It is the length of the
event that is stored. We could write something like:
/*
This happens in the IO thread when it reads an event from
binlog_version < 4. Here, we save the size that the event had in
the master's binlog. See comment for master_data_written in
log_event.h.
*/
> + */
> + DBUG_ASSERT(description_event->binlog_version < 4);
This will crash the server if the binlog is malformed. Please do
something like this instead:
if (description_event->binlog_version >= 4) {
DBUG_PRINT("info", ("malformed binlog: post-header too small in "
"event found in binlog version >= 4."))
query= 0;
DBUG_RETURN_VOID;
}
> + master_data_written= data_written;
> }
> /*
> We have parsed everything we know in the post header for QUERY_EVENT,
> @@ -1665,6 +1688,12 @@ Query_log_event::Query_log_event(const c
> table_map_for_update= uint8korr(pos);
> pos+= 8;
> break;
> + case Q_MASTER_DATA_WRITTEN_CODE:
> + CHECK_SPACE(pos, end, 4);
> + data_written= master_data_written= uint4korr(pos);
> + pos+= 4;
> + DBUG_ASSERT(description_event->binlog_version >= 4);
This will crash the server if the binlog it gets from the network is
malformed. We should never crash on bad input. Please do something like
this instead:
if (description_event->binlog_version < 4) {
DBUG_PRINT("info", ("malformed binlog: Q_MASTER_DATA_WRITTEN_CODE "
"found in binlog version < 4"));
query= 0;
DBUG_RETURN_VOID;
}
> + break;
> default:
> /* That's why you must write status vars in growing order of code */
> DBUG_PRINT("info",("Query_log_event has unknown status vars (first has\
>
> === modified file 'sql/log_event.h'
> --- a/sql/log_event.h 2009-03-27 05:19:50 +0000
> +++ b/sql/log_event.h 2009-11-13 19:20:40 +0000
> @@ -212,7 +212,8 @@ struct sql_ex_info
> 1 + 1 + 255 /* type, length, time_zone */ + \
> 1 + 2 /* type, lc_time_names_number */ +
> \
> 1 + 2 /* type, charset_database_number
> */ + \
> - 1 + 8 /* type, table_map_for_update */)
> + 1 + 8 /* type, table_map_for_update */ +
> \
> + 1 + 4 /* type, master_data_written */)
> #define MAX_LOG_EVENT_HEADER ( /* in order of Query_log_event::write */ \
> LOG_EVENT_HEADER_LEN + /* write_header */ \
> QUERY_HEADER_LEN + /* write_data */ \
> @@ -278,6 +279,9 @@ struct sql_ex_info
> #define Q_CHARSET_DATABASE_CODE 8
>
> #define Q_TABLE_MAP_FOR_UPDATE_CODE 9
> +
> +#define Q_MASTER_DATA_WRITTEN_CODE 10
> +
> /* Intvar event post-header */
>
> #define I_TYPE_OFFSET 0
> @@ -810,6 +814,13 @@ public:
> statement, for other query statements, this will be zero.
> */
> ulonglong table_map_for_update;
> + /*
> + placeholder for data_written of a Query_log_event originated by an old
> + (version < 5.0) master. The geniune master's value is necessary
> + to calculate the master size event's parameters out of log_pos
> + for instance the beginning of position of the event (log_pos - data_written)
> + */
> + uint32 master_data_written;
This comment is not completely clear. I think we need to explain how and
why log_pos/data_written is changed. I think the comment can say
something like:
/*
Events originating from a master of binlog_version<4 (i.e., server
version < 5.0) are converted by the IO thread to binlog_version=4
before they are written to the relay log. The conversion modifies
binlog positions in two ways:
(1) The log_pos changes. In binlog_version<4, log_pos was the offset
of the beginning of the event in the master's binlog. In
binlog_version=4, log_pos is the offset of the end of the event
in the master's binlog.
(2) The event size grows because the status_vars fields are added.
In the slave SQL thread, UNTIL conditions of the form
"UNTIL MASTER_LOG_FILE = x, MASTER_LOG_POS = y" are tested as follows:
log_pos - event_size >= until_master_log_pos
If this holds, the slave SQL thread stops. Since log_pos is still the
end of the event in the master's binlog and until_master_log_pos is
an offset in the master's binlog, event_size must also be the size of
the event in the master's binlog. Therefore, when the slave IO thread
converts the event, it adds a special status variable that holds the
length of the event in the master's binlog. The status variable is
called Q_MASTER_DATA_WRITTEN_CODE and is stored in
Log_event::master_data_written.
*/
>
> #ifndef MYSQL_CLIENT
>
>
>
>
> ------------------------------------------------------------------------
>
>
--
Sven Sandberg, Software Engineer
MySQL AB, www.mysql.com