List:Internals« Previous MessageNext Message »
From:ingo Date:March 22 2005 3:42pm
Subject:bk commit into 4.0 tree (ingo:1.2089) BUG#8321
View as plain text  
Below is the list of changes that have just been committed into a local
4.0 repository of mydev. When mydev does a push these changes will
be propagated to the main repository and, within 24 hours after the
push, to the public repository.
For information on how to access the public repository
see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html

ChangeSet
  1.2089 05/03/22 15:42:07 ingo@stripped +1 -0
  Bug#8321 - myisampack bug in compression algorithm
  This is the second of three changesets. It contains the pure bug fix.
  It also contains after-review fixes.
  The problem was that with gcc on x86, shifts are done modulo word size. 
  'value' is 32 bits wide and shifting it by 32 bits is a no-op.
  This was triggered by an evil distribution of character incidences. 
  A distribution of 2917027827 characters made of 202 distinct values led to
  34 occurrences of 32-bit Huffman codes.
  This might have been the first time ever that write_bits() had to write
  32-bit values. Since it can be expected that one day even 32 bits might
  be insufficient, the third changeset suggests to enlarge some variables
  to 64 bits.

  myisam/myisampack.c
    1.35 05/03/22 15:42:04 ingo@stripped +64 -2
    Bug#8321 - myisampack bug in compression algorithm
    This is the second of three changesets. It contains the pure bug fix.
    It also contains after-review fixes.
    The problem was that with gcc on x86, shifts are done modulo word size. 
    'value' is 32 bits wide and shifting it by 32 bits is a no-op.
    This was triggered by an evil distribution of character incidences. 
    A distribution of 2917027827 characters made of 202 distinct values led to
    34 occurrences of 32-bit Huffman codes.
    This might have been the first time ever that write_bits() had to write
    32-bit values. Since it can be expected that one day even 32 bits might
    be insufficient, the third changeset suggests to enlarge some variables
    to 64 bits.

# This is a BitKeeper patch.  What follows are the unified diffs for the
# set of deltas contained in the patch.  The rest of the patch, the part
# that BitKeeper cares about, is below these diffs.
# User:	ingo
# Host:	chilla.local
# Root:	/home/mydev/mysql-4.0-bug8321

--- 1.34/myisam/myisampack.c	Wed Oct 27 21:51:24 2004
+++ 1.35/myisam/myisampack.c	Tue Mar 22 15:42:04 2005
@@ -1975,7 +1975,34 @@
   {
     reg3 uint byte_buff;
     bits= (uint) -file_buffer.bits;
-    byte_buff=file_buffer.byte | (uint) (value >> bits);
+    /*
+      Do not try to shift the value if all of its bits would be shifted
+      out of the value. Some platforms shift modulo word width. This
+      would mean no shift at all. Hence, the bits of 'value' would mix
+      with the buffer contents, which would make a mess. (Bug #8321)
+
+      A note for the "casual" reader. You might have noticed the
+      function comment, which says "output `bits` low bits of `value'".
+      You might know that a 'bit' (binary digit) is the smallest piece
+      of information that a computer can work with. You might also know
+      that computers work with bunches of eight bits (bytes) for
+      efficiency reasons. Overmore, to be able to express bigger values
+      than could be done by a byte, modern computers have the ability to
+      work with so called words of different sizes, which, however,
+      always are made of a couple of bytes (usually 2 or 4 or 8). Since
+      the smallest cell of information, a computer can store is a byte,
+      the "size" of a byte is defined as "one". Thus, the expression "8
+      * sizeof(value)" is the number of bits of value (8 bits per byte
+      times the number of bytes). Together with the function comment
+      ("output `bits` low bits of `value'") you might understand that
+      this function will produce predictable results only if the number
+      of bits to write does not exceed the number of bits of 'value'.
+      Consequently, the below statement does not take care of the
+      nonsensical situation, where 'bits' migth be bigger than "8 *
+      sizeof(value)".
+    */
+    byte_buff= (file_buffer.byte |
+               ((bits != 8 * sizeof(value)) ? (uint) (value >> bits) : 0));
 #if BITS_SAVED == 32
     *file_buffer.pos++= (byte) (byte_buff >> 24) ;
     *file_buffer.pos++= (byte) (byte_buff >> 16) ;
@@ -1983,7 +2010,42 @@
     *file_buffer.pos++= (byte) (byte_buff >> 8) ;
     *file_buffer.pos++= (byte) byte_buff;
 
-    value&=(1 << bits)-1;
+    /*
+      Do not try to shift a bit through all of the word. Some platforms
+      shift modulo word width. This would mean no shift at all. Hence,
+      the 1 would stay a 1. Another 1 gets subtracted and so a 0
+      remains. This would clear the value here. (Bug #8321)
+
+      A note for the "casual" reader. You might have noticed the
+      function comment, which says "output `bits` low bits of `value'".
+      You might know that a 'bit' (binary digit) is the smallest piece
+      of information that a computer can work with. You might also know
+      that computers work with bunches of eight bits (bytes) for
+      efficiency reasons. Overmore, to be able to express bigger values
+      than could be done by a byte, modern computers have the ability to
+      work with so called words of different sizes, which, however,
+      always are made of a couple of bytes (usually 2 or 4 or 8). Since
+      the smallest cell of information, a computer can store is a byte,
+      the "size" of a byte is defined as "one". Thus, the expression "8
+      * sizeof(value)" is the number of bits of value (8 bits per byte
+      times the number of bytes). Together with the function comment
+      ("output `bits` low bits of `value'") you might understand that
+      this function will produce predictable results only if the number
+      of bits to write does not exceed the number of bits of 'value'.
+      Consequently, the below statement does not take care of the
+      nonsensical situation, where 'bits' migth be bigger than "8 *
+      sizeof(value)".
+
+      The sense of the below statement is to clear all of the bits of
+      'value' which have already been copied into the file buffer. This
+      is done by shifting a '1' bit to the position of the lowest bit
+      copied above. Subtracting '1' from that value, clears that bit and
+      sets all bits below it. Using that "mask" in an bit-wise 'and'
+      operation, retains all those bits in 'value' and clears all higher
+      bits.
+    */
+    if (bits != 8 * sizeof(value))
+      value&= (((ulong) 1) << bits) - 1;
 #if BITS_SAVED == 16
     if (bits >= sizeof(uint))
     {
Thread
bk commit into 4.0 tree (ingo:1.2089) BUG#8321ingo22 Mar