List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:June 25 2007 1:10pm
Subject:Re: bk commit into 5.0 tree (tnurnberg:1.2502) BUG#22540
View as plain text  
Hello,

On Mon, Jun 25, 2007 at 06:59:28AM +0200, Tatjana A Nuernberg wrote:
> ChangeSet@stripped, 2007-06-25 06:59:20+02:00, tnurnberg@stripped +3 -0
>   Bug #22540: Incorrect value in column End_log_pos of SHOW BINLOG EVENTS using
> InnoDB
>   
>   fix binlog-writing so that end_log_pos is given correctly even
>   within transactions for both SHOW BINLOG and SHOW MASTER STATUS,
>   that is as absolute values (from log start) rather than relative
>   values (from transaction's start).


> --- 1.210/sql/log.cc	2007-04-27 15:33:46 +02:00
> +++ 1.211/sql/log.cc	2007-06-25 06:59:18 +02:00
> @@ -1835,7 +1835,9 @@ bool MYSQL_LOG::write(THD *thd, IO_CACHE
>  
>    if (likely(is_open()))                       // Should always be true
>    {
> -    uint length;
> +    uint length, group, carry, hdr_offs;
> +    long val;

group and val are of the same nature (they are end_log_pos), should be
both uint (uint is enough as they are encoded in 4 bytes in the event,
using uint4korr/int4store).

> +    byte header[LOG_EVENT_HEADER_LEN];
>  
>      /*
>        Log "BEGIN" at the beginning of the transaction.
> @@ -1867,10 +1869,59 @@ bool MYSQL_LOG::write(THD *thd, IO_CACHE
>      /* Read from the file used to cache the queries .*/
>      if (reinit_io_cache(cache, READ_CACHE, 0, 0, 0))
>        goto err;
> -    length=my_b_bytes_in_cache(cache);
> +
> +    length= my_b_bytes_in_cache(cache);
>      DBUG_EXECUTE_IF("half_binlogged_transaction", length-=100;);

Please put a comment explaining what you are doing below, for future
devs. It could explain that the events in the cache have an incorrect
end_log_pos, as the offset of the cache's start in the log was not
known when these events were put in the cache, and now that we know
the offset of the cache in the log we go through all event headers in
this cache and fix their end_log_pos on the fly.


> +    group= my_b_tell(&log_file);
> +    hdr_offs= carry= 0;
>      do
>      {
> +      if (likely(carry > 0))

carry>0 is only if an event's header (4 bytes) was split. First, a
split requires that the cache contains more than 32KB and its
probability should not be more than 4/32768, I think you could use
unlikely() instead of likely().

> +      {
> +        DBUG_ASSERT(carry < LOG_EVENT_HEADER_LEN);
> +
> +        memcpy(&header[carry], (char *)cache->read_pos, LOG_EVENT_HEADER_LEN
> - carry);
> +
> +        val= uint4korr(&header[LOG_POS_OFFSET]) + group;
> +        int4store(&header[LOG_POS_OFFSET], val);
> +
> +        if (my_b_write(&log_file, header, carry))
> +          goto err;
> +
> +        memcpy((char *)cache->read_pos, &header[carry], LOG_EVENT_HEADER_LEN
> - carry);
> +
> +        hdr_offs = LOG_EVENT_HEADER_LEN - carry +
> +          uint4korr(&header[EVENT_LEN_OFFSET]);
> +
> +        carry= 0;
> +      }
> +
> +      do {
> +        DBUG_ASSERT((hdr_offs + max(EVENT_LEN_OFFSET, LOG_POS_OFFSET) + 4) <=
> length);
> +
> +        val= uint4korr((char *)cache->read_pos + hdr_offs + LOG_POS_OFFSET) +
> group;
> +        int4store((char *)cache->read_pos + hdr_offs + LOG_POS_OFFSET, val);

int4store() is on some platforms:
#define int4store(T,A)       do { *((char *)(T))=(char) ((A));\
                                  *(((char *)(T))+1)=(char) (((A) >> 8));\
                                  *(((char *)(T))+2)=(char) (((A) >> 16));\
                                  *(((char *)(T))+3)=(char) (((A) >> 24)); }
while(0)

so to avoid computing the same additions 4 times, you could store the
result of the addition into some variable on the stack.

> +        hdr_offs += uint4korr((char *)cache->read_pos + hdr_offs +
> EVENT_LEN_OFFSET);

same for this uint4korr

> +
> +        /* header beyond current read-buffer? */
> +        if (hdr_offs >= length)
> +        {
> +          hdr_offs -= length;
> +          break;
> +        }

I think the if() block above should be moved to the start of the
do...while(), otherwise long events will cause problems. I try to
explain why below.
The cache's in-memory-buffer has a default size of 32KB (default value
of "binlog_cache_size").
Assume that in the cache, after some small events there is an event
long of 100KB (INSERT of big BLOB).
hdr_offs is ~100KB, it's >=length (32KB), so hdr_offs is set to
~68KB and we break out of the inner loop. 'carry' is 0.
Then we come to:
      do {
        DBUG_ASSERT((hdr_offs + max(EVENT_LEN_OFFSET, LOG_POS_OFFSET) + 4) <= length);
        val= uint4korr((char *)cache->read_pos + hdr_offs + LOG_POS_OFFSET) + group;

The assertion will fail.
In an non-debug build, the uint4korr will read memory too far (as 68KB
is still beyond 32KB).


> +
> +        /* split header? */
> +        if (hdr_offs + LOG_EVENT_HEADER_LEN > length)
> +        {
> +          carry= length - hdr_offs;
> +
> +          memcpy(header, (char *)cache->read_pos + hdr_offs, carry);
> +          length -= carry;

no space before -= according to the coding guidelines.
length -= carry;
can be replaced by
length= hdr_offs.

> +        }
> +
> +      } while (hdr_offs < length);
> +
>        /* Write data to the binary log file */
>        if (my_b_write(&log_file, cache->read_pos, length))
>  	goto err;
> 
> --- 1.9/mysql-test/r/binlog.result	2007-06-15 23:57:23 +02:00
> +++ 1.10/mysql-test/r/binlog.result	2007-06-25 06:59:18 +02:00

> +show binlog events from 0;
> +Log_name	Pos	Event_type	Server_id	End_log_pos	Info
> +master-bin.000001	4	Format_desc	1	98	Server version, Binlog ver: 4
> +master-bin.000001	98	Query	1	197	use `test`; create table t1(n int) engine=innodb
> +master-bin.000001	197	Query	1	265	use `test`; BEGIN
> +master-bin.000001	265	Query	1	353	use `test`; insert into t1 values (1)
> +master-bin.000001	353	Query	1	441	use `test`; insert into t1 values (2)
> +master-bin.000001	441	Query	1	529	use `test`; insert into t1 values (3)
> +master-bin.000001	529	Xid	1	556	COMMIT /* XID */
> +master-bin.000001	556	Query	1	632	use `test`; drop table t1

Feels good to see this nice sequence of numbers, "starts at 197 and
ends at 265" "starts at 265 and ends at 353" etc. Pleasant music to
the ears. Thank you for the bugfix!!

-- 
   __  ___     ___ ____  __
  /  |/  /_ __/ __/ __ \/ /    Mr. Guilhem Bichot <guilhem@stripped>
 / /|_/ / // /\ \/ /_/ / /__   MySQL AB, Lead Software Engineer
/_/  /_/\_, /___/\___\_\___/   Bordeaux, France
       <___/   www.mysql.com   
Thread
bk commit into 5.0 tree (tnurnberg:1.2502) BUG#22540Tatjana A Nuernberg25 Jun
  • Re: bk commit into 5.0 tree (tnurnberg:1.2502) BUG#22540Guilhem Bichot25 Jun