Hello,
On Mon, Jun 25, 2007 at 06:59:28AM +0200, Tatjana A Nuernberg wrote:
> ChangeSet@stripped, 2007-06-25 06:59:20+02:00, tnurnberg@stripped +3 -0
> Bug #22540: Incorrect value in column End_log_pos of SHOW BINLOG EVENTS using
> InnoDB
>
> fix binlog-writing so that end_log_pos is given correctly even
> within transactions for both SHOW BINLOG and SHOW MASTER STATUS,
> that is as absolute values (from log start) rather than relative
> values (from transaction's start).
> --- 1.210/sql/log.cc 2007-04-27 15:33:46 +02:00
> +++ 1.211/sql/log.cc 2007-06-25 06:59:18 +02:00
> @@ -1835,7 +1835,9 @@ bool MYSQL_LOG::write(THD *thd, IO_CACHE
>
> if (likely(is_open())) // Should always be true
> {
> - uint length;
> + uint length, group, carry, hdr_offs;
> + long val;
group and val are of the same nature (they are end_log_pos), should be
both uint (uint is enough as they are encoded in 4 bytes in the event,
using uint4korr/int4store).
> + byte header[LOG_EVENT_HEADER_LEN];
>
> /*
> Log "BEGIN" at the beginning of the transaction.
> @@ -1867,10 +1869,59 @@ bool MYSQL_LOG::write(THD *thd, IO_CACHE
> /* Read from the file used to cache the queries .*/
> if (reinit_io_cache(cache, READ_CACHE, 0, 0, 0))
> goto err;
> - length=my_b_bytes_in_cache(cache);
> +
> + length= my_b_bytes_in_cache(cache);
> DBUG_EXECUTE_IF("half_binlogged_transaction", length-=100;);
Please put a comment explaining what you are doing below, for future
devs. It could explain that the events in the cache have an incorrect
end_log_pos, as the offset of the cache's start in the log was not
known when these events were put in the cache, and now that we know
the offset of the cache in the log we go through all event headers in
this cache and fix their end_log_pos on the fly.
> + group= my_b_tell(&log_file);
> + hdr_offs= carry= 0;
> do
> {
> + if (likely(carry > 0))
carry>0 is only if an event's header (4 bytes) was split. First, a
split requires that the cache contains more than 32KB and its
probability should not be more than 4/32768, I think you could use
unlikely() instead of likely().
> + {
> + DBUG_ASSERT(carry < LOG_EVENT_HEADER_LEN);
> +
> + memcpy(&header[carry], (char *)cache->read_pos, LOG_EVENT_HEADER_LEN
> - carry);
> +
> + val= uint4korr(&header[LOG_POS_OFFSET]) + group;
> + int4store(&header[LOG_POS_OFFSET], val);
> +
> + if (my_b_write(&log_file, header, carry))
> + goto err;
> +
> + memcpy((char *)cache->read_pos, &header[carry], LOG_EVENT_HEADER_LEN
> - carry);
> +
> + hdr_offs = LOG_EVENT_HEADER_LEN - carry +
> + uint4korr(&header[EVENT_LEN_OFFSET]);
> +
> + carry= 0;
> + }
> +
> + do {
> + DBUG_ASSERT((hdr_offs + max(EVENT_LEN_OFFSET, LOG_POS_OFFSET) + 4) <=
> length);
> +
> + val= uint4korr((char *)cache->read_pos + hdr_offs + LOG_POS_OFFSET) +
> group;
> + int4store((char *)cache->read_pos + hdr_offs + LOG_POS_OFFSET, val);
int4store() is on some platforms:
#define int4store(T,A) do { *((char *)(T))=(char) ((A));\
*(((char *)(T))+1)=(char) (((A) >> 8));\
*(((char *)(T))+2)=(char) (((A) >> 16));\
*(((char *)(T))+3)=(char) (((A) >> 24)); }
while(0)
so to avoid computing the same additions 4 times, you could store the
result of the addition into some variable on the stack.
> + hdr_offs += uint4korr((char *)cache->read_pos + hdr_offs +
> EVENT_LEN_OFFSET);
same for this uint4korr
> +
> + /* header beyond current read-buffer? */
> + if (hdr_offs >= length)
> + {
> + hdr_offs -= length;
> + break;
> + }
I think the if() block above should be moved to the start of the
do...while(), otherwise long events will cause problems. I try to
explain why below.
The cache's in-memory-buffer has a default size of 32KB (default value
of "binlog_cache_size").
Assume that in the cache, after some small events there is an event
long of 100KB (INSERT of big BLOB).
hdr_offs is ~100KB, it's >=length (32KB), so hdr_offs is set to
~68KB and we break out of the inner loop. 'carry' is 0.
Then we come to:
do {
DBUG_ASSERT((hdr_offs + max(EVENT_LEN_OFFSET, LOG_POS_OFFSET) + 4) <= length);
val= uint4korr((char *)cache->read_pos + hdr_offs + LOG_POS_OFFSET) + group;
The assertion will fail.
In an non-debug build, the uint4korr will read memory too far (as 68KB
is still beyond 32KB).
> +
> + /* split header? */
> + if (hdr_offs + LOG_EVENT_HEADER_LEN > length)
> + {
> + carry= length - hdr_offs;
> +
> + memcpy(header, (char *)cache->read_pos + hdr_offs, carry);
> + length -= carry;
no space before -= according to the coding guidelines.
length -= carry;
can be replaced by
length= hdr_offs.
> + }
> +
> + } while (hdr_offs < length);
> +
> /* Write data to the binary log file */
> if (my_b_write(&log_file, cache->read_pos, length))
> goto err;
>
> --- 1.9/mysql-test/r/binlog.result 2007-06-15 23:57:23 +02:00
> +++ 1.10/mysql-test/r/binlog.result 2007-06-25 06:59:18 +02:00
> +show binlog events from 0;
> +Log_name Pos Event_type Server_id End_log_pos Info
> +master-bin.000001 4 Format_desc 1 98 Server version, Binlog ver: 4
> +master-bin.000001 98 Query 1 197 use `test`; create table t1(n int) engine=innodb
> +master-bin.000001 197 Query 1 265 use `test`; BEGIN
> +master-bin.000001 265 Query 1 353 use `test`; insert into t1 values (1)
> +master-bin.000001 353 Query 1 441 use `test`; insert into t1 values (2)
> +master-bin.000001 441 Query 1 529 use `test`; insert into t1 values (3)
> +master-bin.000001 529 Xid 1 556 COMMIT /* XID */
> +master-bin.000001 556 Query 1 632 use `test`; drop table t1
Feels good to see this nice sequence of numbers, "starts at 197 and
ends at 265" "starts at 265 and ends at 353" etc. Pleasant music to
the ears. Thank you for the bugfix!!
--
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Mr. Guilhem Bichot <guilhem@stripped>
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, Lead Software Engineer
/_/ /_/\_, /___/\___\_\___/ Bordeaux, France
<___/ www.mysql.com