Author: istruewing
Date: 2006-11-02 11:45:53 +0100 (Thu, 02 Nov 2006)
New Revision: 3828
Log:
MyISAM compressed file layout
- Changes in Huffman tree objects nomenclature
- Fixes on the bit level
- Ranges in length encoding
- Forgotten alignment bits
- Changed some field names hoping to be clearer
Modified:
trunk/internals/myisam.xml
Modified: trunk/internals/myisam.xml
===================================================================
--- trunk/internals/myisam.xml 2006-11-02 10:36:10 UTC (rev 3827)
+++ trunk/internals/myisam.xml 2006-11-02 10:45:53 UTC (rev 3828)
Changed blocks: 5, Lines Added: 34, Lines Deleted: 16; 4215 bytes
@@ -1849,12 +1849,12 @@
<para>
A length from 1 to 253 bytes is represented in one byte. A
- length of 254 to 65536 bytes (64KB) is represented by three
+ length of 254 to 65535 bytes (64KB-1) is represented by three
bytes. The first contains the value 254 and the next two bytes
contain the plain length. The low order byte goes first. A
- length of 65537 to 4294967296 bytes (4GB) is represented by five
- bytes. The first contains the value 255 and the next four bytes
- contain the plain length. The low order byte goes first.
+ length of 65536 to 4294967295 bytes (4GB-1) is represented by
+ five bytes. The first contains the value 255 and the next four
+ bytes contain the plain length. The low order byte goes first.
</para>
<para>
@@ -1871,22 +1871,22 @@
<para>
The code trees are binary trees. Every node has exactly two
- children. The children can be leaves or nodes. Each leaf contains one
- original, uncompressed value. The nodes do not contain values,
- but only pointers to the left and right child. The Huffman codes
- represent the navigation through the tree. Every left branch
- gets a 0 bit, every right branch gets a 1 bit.
+ children. The children can be leaves or branches. A leaf
+ contains one original, uncompressed value. A branch contains a
+ pointer to another node. The Huffman codes represent the
+ navigation through the tree. Every left branch gets a 0 bit,
+ every right branch gets a 1 bit.
</para>
<para>
The in-memory representation of the trees are two unsigned
integers per node. Each describes either a leaf value or an
- offset (in unsigned integers) to the child node. To distinguish
- values from offsets, the 15th bit (decimal value 32768) is set
- together with offsets. This is safe as the size of the trees is
- limited by either having a maximum of 256 elements for byte
- value compression or 4096 elements for distinct column value
- compression.
+ offset (in unsigned integers relative from this node) to another
+ node. To distinguish values from offsets, the 15th bit (decimal
+ value 32768) is set together with offsets. This is safe as the
+ size of the trees is limited by either having a maximum of 256
+ elements for byte value compression or 4096 elements for
+ distinct column value compression.
</para>
<para>
@@ -1957,7 +1957,7 @@
4 byte total number of bytes collected for distinct column values
2 byte number of code trees
1 byte maximum number of bytes required to represent record+blob lengths
-1 byte number of bytes required to represent the compressed data file length
+1 byte record pointer length, number of bytes for compressed data file length
4 byte zeros
</programlisting>
@@ -1995,6 +1995,14 @@
</programlisting>
<para>
+ Alignment:
+ </para>
+
+<programlisting>
+x bits alignment to the next byte border
+</programlisting>
+
+ <para>
Code Trees. For every tree:
</para>
@@ -2024,7 +2032,17 @@
<programlisting>
1-5 bytes length of the compressed record in bytes
+ 1. byte 0..253 length
+ 254 length encoded in the next two bytes little endian
+ 255 length encoded in the next x bytes little endian
+ x = 3 for pack file version 1
+ x = 4 for pack file version > 1
1-5 bytes total length of all expanded blobs of this record
+ 1. byte 0..253 length
+ 254 length encoded in the next two bytes little endian
+ 255 length encoded in the next x bytes little endian
+ x = 3 for pack file version 1
+ x = 4 for pack file version > 1
For every column:
If pack type includes PACK_TYPE_SPACE_FIELDS,
1 bit 1 = spaces only, 0 = not only spaces
| Thread |
|---|
| • svn commit - mysqldoc@docsrva: r3828 - trunk/internals | istruewing | 2 Nov |