List:Commits« Previous MessageNext Message »
From:istruewing Date:November 2 2006 11:45am
Subject:svn commit - mysqldoc@docsrva: r3828 - trunk/internals
View as plain text  
Author: istruewing
Date: 2006-11-02 11:45:53 +0100 (Thu, 02 Nov 2006)
New Revision: 3828

Log:
MyISAM compressed file layout
- Changes in Huffman tree objects nomenclature
- Fixes on the bit level
  - Ranges in length encoding
  - Forgotten alignment bits
  - Changed some field names hoping to be clearer


Modified:
   trunk/internals/myisam.xml


Modified: trunk/internals/myisam.xml
===================================================================
--- trunk/internals/myisam.xml	2006-11-02 10:36:10 UTC (rev 3827)
+++ trunk/internals/myisam.xml	2006-11-02 10:45:53 UTC (rev 3828)
Changed blocks: 5, Lines Added: 34, Lines Deleted: 16; 4215 bytes

@@ -1849,12 +1849,12 @@
 
       <para>
         A length from 1 to 253 bytes is represented in one byte. A
-        length of 254 to 65536 bytes (64KB) is represented by three
+        length of 254 to 65535 bytes (64KB-1) is represented by three
         bytes. The first contains the value 254 and the next two bytes
         contain the plain length. The low order byte goes first. A
-        length of 65537 to 4294967296 bytes (4GB) is represented by five
-        bytes. The first contains the value 255 and the next four bytes
-        contain the plain length. The low order byte goes first.
+        length of 65536 to 4294967295 bytes (4GB-1) is represented by
+        five bytes. The first contains the value 255 and the next four
+        bytes contain the plain length. The low order byte goes first.
       </para>
 
       <para>

@@ -1871,22 +1871,22 @@
 
       <para>
         The code trees are binary trees. Every node has exactly two
-        children. The children can be leaves or nodes. Each leaf contains one
-        original, uncompressed value. The nodes do not contain values,
-        but only pointers to the left and right child. The Huffman codes
-        represent the navigation through the tree. Every left branch
-        gets a 0 bit, every right branch gets a 1 bit.
+        children. The children can be leaves or branches. A leaf
+        contains one original, uncompressed value. A branch contains a
+        pointer to another node. The Huffman codes represent the
+        navigation through the tree. Every left branch gets a 0 bit,
+        every right branch gets a 1 bit.
       </para>
 
       <para>
         The in-memory representation of the trees are two unsigned
         integers per node. Each describes either a leaf value or an
-        offset (in unsigned integers) to the child node. To distinguish
-        values from offsets, the 15th bit (decimal value 32768) is set
-        together with offsets. This is safe as the size of the trees is
-        limited by either having a maximum of 256 elements for byte
-        value compression or 4096 elements for distinct column value
-        compression.
+        offset (in unsigned integers relative from this node) to another
+        node. To distinguish values from offsets, the 15th bit (decimal
+        value 32768) is set together with offsets. This is safe as the
+        size of the trees is limited by either having a maximum of 256
+        elements for byte value compression or 4096 elements for
+        distinct column value compression.
       </para>
 
       <para>

@@ -1957,7 +1957,7 @@
 4 byte  total number of bytes collected for distinct column values
 2 byte  number of code trees
 1 byte  maximum number of bytes required to represent record+blob lengths
-1 byte  number of bytes required to represent the compressed data file length
+1 byte  record pointer length, number of bytes for compressed data file length
 4 byte  zeros
 </programlisting>
 

@@ -1995,6 +1995,14 @@
 </programlisting>
 
       <para>
+        Alignment:
+      </para>
+
+<programlisting>
+x bits  alignment to the next byte border
+</programlisting>
+
+      <para>
         Code Trees. For every tree:
       </para>
 

@@ -2024,7 +2032,17 @@
 
 <programlisting>
 1-5 bytes  length of the compressed record in bytes
+    1. byte  0..253 length
+             254    length encoded in the next two bytes little endian
+             255    length encoded in the next  x  bytes little endian
+                    x = 3  for pack file version 1
+                    x = 4  for pack file version > 1
 1-5 bytes  total length of all expanded blobs of this record
+    1. byte  0..253 length
+             254    length encoded in the next two bytes little endian
+             255    length encoded in the next  x  bytes little endian
+                    x = 3  for pack file version 1
+                    x = 4  for pack file version > 1
 For every column:
     If pack type includes PACK_TYPE_SPACE_FIELDS,
         1 bit   1 = spaces only, 0 = not only spaces


Thread
svn commit - mysqldoc@docsrva: r3828 - trunk/internalsistruewing2 Nov