List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:June 27 2007 10:27am
Subject:Re: bk commit into 5.0 tree (tnurnberg:1.2503) BUG#22540
View as plain text  
Hello,

On Mon, Jun 25, 2007 at 06:41:21PM +0200, Tatjana A Nuernberg wrote:
> ChangeSet@stripped, 2007-06-25 18:41:17+02:00, tnurnberg@stripped +1 -0
>   Bug#22540: Incorrect value in column End_log_pos of SHOW BINLOG EVENTS using
> InnoDB
>   
>   end_log_pos data within a transaction are relative to
>   the start of the transaction rather than absolute.
>   we fix those groups in situ before writing the log out.
>   
>   additional comments and handling for groups with very
>   large single events, as suggested by Guilhem.
> 
>   sql/log.cc@stripped, 2007-06-25 18:41:15+02:00, tnurnberg@stripped +73
> -27
>     Bug#22540: Incorrect value in column End_log_pos of SHOW BINLOG EVENTS using
> InnoDB
>     
>     end_log_pos data within a transaction are relative to
>     the start of the transaction rather than absolute.
>     we fix those groups in situ before writing the log out.
>     
>     additional comments and handling for groups with very
>     large single events, as suggested by Guilhem.

> --- 1.211/sql/log.cc	2007-06-25 11:34:22 +02:00
> +++ 1.212/sql/log.cc	2007-06-25 18:41:15 +02:00
> @@ -1835,8 +1835,7 @@ bool MYSQL_LOG::write(THD *thd, IO_CACHE
>  
>    if (likely(is_open()))                       // Should always be true
>    {
> -    uint length, group, carry, hdr_offs;
> -    long val;
> +    uint length, group, carry, hdr_offs, val;
>      byte header[LOG_EVENT_HEADER_LEN];
>  
>      /*
> @@ -1873,57 +1872,104 @@ bool MYSQL_LOG::write(THD *thd, IO_CACHE
>      length= my_b_bytes_in_cache(cache);
>      DBUG_EXECUTE_IF("half_binlogged_transaction", length-=100;);
>  
> +    /*
> +      The events in the buffer have incorrect end_log_pos data
> +      (relative to beginning of group rather than absolute),
> +      so we'll recalculate them in situ so the binlog is always
> +      correct, even in the middle of a group. This is possible
> +      because we now know the start position of the group (the
> +      offset of this cache in the log, if you will); all we need
> +      to do is to find all event-headers, and add the position of
> +      the group to the end_log_pos of each event.  This is pretty
> +      straight forward, except that we read the cache in segments,
> +      so an event-header might end up on the cache-border and get
> +      split.
> +     */
> +
>      group= my_b_tell(&log_file);
>      hdr_offs= carry= 0;
>  
>      do
>      {
> -      if (likely(carry > 0))
> +
> +      /*
> +        if we only got a partial header in the last iteration,
> +        get the other half now and process a full header.
> +      */
> +      if (unlikely(carry > 0))
>        {
>          DBUG_ASSERT(carry < LOG_EVENT_HEADER_LEN);
>  
> +        /* assemble both halves */
>          memcpy(&header[carry], (char *)cache->read_pos, LOG_EVENT_HEADER_LEN
> - carry);
>  
> +        /* fix end_log_pos */
>          val= uint4korr(&header[LOG_POS_OFFSET]) + group;
>          int4store(&header[LOG_POS_OFFSET], val);
>  
> +        /* write the first half of the split header */
>          if (my_b_write(&log_file, header, carry))
>            goto err;
>  
> +        /*
> +          copy fixed second half of header to cache so the correct
> +          version will be written later.
> +        */
>          memcpy((char *)cache->read_pos, &header[carry], LOG_EVENT_HEADER_LEN
> - carry);
>  
> +        /* next event header at ... */
>          hdr_offs = LOG_EVENT_HEADER_LEN - carry +
>            uint4korr(&header[EVENT_LEN_OFFSET]);

This formula looks wrong.
The offset of the start of the current event's header is negative: it's
"-carry".
uint4korr(&header[EVENT_LEN_OFFSET]) is the total length of the
current event, header included.
The offset of the start of the next event's header is
uint4korr(&header[EVENT_LEN_OFFSET]) bytes after the offset of the
start of the current event's header, so it's at
"uint4korr(&header[EVENT_LEN_OFFSET]) - carry".

This patch will be used by all users of transactional tables as soon
as they make a write to the table. It's critical to have no bug in
it; a wrong hdr_offs like the one above corrupts the event's header
and so makes the binlog un-parsable, making point-in-time recovery and
replication impossible. So, I would like you to ask you for more
coverage of this patch: could you please test this scenario:
- transactional engine (InnoDB)
- several binlog events, some of which span several cache-size, others
which have a "carry>0" (you could gdb to see that these two facts
actually happen).
- you could adjust binlog_cache_size to a smaller value than 32K for
easier testing.
It should end up in a .test file. In the testsuite we cannot
assert that carry>0 is true, but it will be initially (as you will
have tailored the events for that), so we will have good initial
coverage of the patch (until the events' format is later changed,
making them longer, then carry>0 may not be true anymore).
Thank you.

>  
>          carry= 0;
>        }
>  
> +      /* if there is anything to write, process it. */
> +
>        if(likely(length > 0))
>        {
> -        do {
> -          DBUG_ASSERT((hdr_offs + max(EVENT_LEN_OFFSET, LOG_POS_OFFSET) + 4) <=
> length);
> -
> -          val= uint4korr((char *)cache->read_pos + hdr_offs + LOG_POS_OFFSET) +
> group;
> -          int4store((char *)cache->read_pos + hdr_offs + LOG_POS_OFFSET, val);
> -          hdr_offs += uint4korr((char *)cache->read_pos + hdr_offs +
> EVENT_LEN_OFFSET);
> -
> -          /* header beyond current read-buffer? */
> -          if (hdr_offs >= length)
> -          {
> -            hdr_offs -= length;
> -            break;
> -          }
> -
> -          /* split header? */
> -          if (hdr_offs + LOG_EVENT_HEADER_LEN > length)
> -          {
> -            carry= length - hdr_offs;
> -
> -            memcpy(header, (char *)cache->read_pos + hdr_offs, carry);
> -            length -= carry;
> -          }
> -
> -        } while (hdr_offs < length);
> +        /*
> +          next header beyond current read-buffer? we'll get it later
> +          (though not necessarily in the very next iteration).
> +        */
> +
> +        if (hdr_offs >= length)
> +          hdr_offs -= length;
> +        else
> +        {
> +
> +          /* process all event-headers in this (partial) cache. */
> +
> +          do {
> +
> +            /*
> +              partial header only? save what we can get, process once
> +              we get the rest.
> +            */
> +
> +            if (hdr_offs + LOG_EVENT_HEADER_LEN > length)
> +            {
> +              carry= length - hdr_offs;
> +              memcpy(header, (char *)cache->read_pos + hdr_offs, carry);
> +              length= hdr_offs;
> +            }
> +            else
> +            {
> +              /* we've got a full event-header, and it came in one piece */
> +
> +              char *log_pos= cache->read_pos + hdr_offs + LOG_POS_OFFSET;
> +
> +              /* fix end_log_pos */
> +              val= uint4korr(log_pos) + group;
> +              int4store(log_pos, val);
> +
> +              /* next event header at ... */
> +              log_pos= (char *)cache->read_pos + hdr_offs + EVENT_LEN_OFFSET;
> +              hdr_offs += uint4korr(log_pos);
> +
> +            }
> +          } while (hdr_offs < length);
> +        }
>        }
>  
>        /* Write data to the binary log file */
> 
> -- 
> MySQL Code Commits Mailing List
> For list archives: http://lists.mysql.com/commits
> To unsubscribe:    http://lists.mysql.com/commits?unsub=1
> 

-- 
   __  ___     ___ ____  __
  /  |/  /_ __/ __/ __ \/ /    Mr. Guilhem Bichot <guilhem@stripped>
 / /|_/ / // /\ \/ /_/ / /__   MySQL AB, Lead Software Engineer
/_/  /_/\_, /___/\___\_\___/   Bordeaux, France
       <___/   www.mysql.com   
Thread
bk commit into 5.0 tree (tnurnberg:1.2503) BUG#22540Tatjana A Nuernberg26 Jun
  • Re: bk commit into 5.0 tree (tnurnberg:1.2503) BUG#22540Guilhem Bichot27 Jun