Now that you've solved that problem, here's the answer to the
<previous send was canceled because the attachment didn't please
Sun's mail system - so instead of carefully crafted formatting,
you get plain text. bummer>
Falcon record encoding
The purpose of data compression in a database is to improve throughput by reducing the
amount of I/O needed to read a set of records. Serious compression algorithms can
reduce the amount of data stored enormously just zip a database file to see how much
can be saved but they do so at a cost of CPU that overwhelms the I/O savings.
Simple run-length compression reduces the amount of data stored, cheaply, but when the
amount of empty space is significant, it wastes considerable space.
Much of the compressible space in a database occurs because fixed length fields are used
to hold variable length data. Fields must be declared large enough to hold the largest
possible value. Compression helps primarily by removing the trailing blanks in char and
Rather than compress data, Falcon uses a variable length data encoding for records stored
in the memory-resident record cache and on disk. The encoding depends on the stored
values, not on the type declaration of the field. The encoding eliminates excess space,
a slight increase in the cost of referencing individual fields.
When records hold their fields in full storage format, the system can reference individual
fields by indexing directly into the record. Any value-based compression or encoding
requires special handing to locate individual fields. To reference field three, for
requires reading the encoding byte and possibly the length of fields one and two to locate
the code for field three. The in-memory record handling code keeps an index of the
offsets of fields it has visited, so finding field thirty-six after finding field
does not require looking at fields one through thirty-three.
Each field in a Falcon encoded record starts with a byte that describes the field, giving
possible 255 different field types, which is a lot, even considering the number of data
types currently supported. The types are declared in an enum (EncodedDataStream.h).
The enum names generally have the format eds<XXX><YYY><nn>.
<XXX>is the data
type. <YYY> is either Len or Count. If it is Len, then nn is generally the number
bytes that follow. If <YYY> is Count, then nn is generally the number of count
that represent the length of the actual value.
Regardless of type, all null-valued fields are represented by a type byte that indicates
the field is null. Nulls are edsNull, regardless of their declared type, making it the
exception to the naming rule, since it has neither a <YYY> section nor an
switching from compression to record encoding, Falcon kept bits indicating which fields
were null, effectively using one byte for every eight fields in the record. Record
encoding eliminates the need for null flag bits.
Falcon represents several MySQL data types as unscaled integer: tiny, short, int24, long,
longlong, year, enum, set, and bit. Numeric and decimal values less than 19 digits with a
scale of zero are also stored as unscaled integers.
Unscaled integers between 10 and 31 are represented by distinct types. Regardless of
the declaration of the field tinyint, smallint, int, or bigint, if the value is between
and 31, the field is represented by a single byte. For example, any integer field with a
value of zero is represented by edsInt0, any integer field with the value 22 is
as edsInt22, and any integer field with the value 9 is represented by edsIntMinus9.
Larger integer values start with a single byte that indicates how many bytes are required
to represent the value, and that number of bytes. The range is from one byte (values from
128 to 127, excluding 10 to 31) to eight bytes, including the rarely seen 3, 5, and 7
lengths. For example, the value 100,000 is expressed in four bytes one type byte and
three data bytes as edsIntLen3, 1, 134, 160 [Note the bytes may be in reverse order,
they are consistent and not dependent on machine architecture] [Second note, that
conversion was done partly on my fingers and is not reliable.] The representation is the
same whether the field was declared as int24, long, or longlong.
For some ranges values of integer fields, the encoded data representation is longer than
the actual field, but any integer whose value has a leading byte of zeros encodes to its
storage length. Any integer whose value has two or more leading bytes of zeros encodes
to less than its storage length.
Falcon uses scaled integer values to represent scaled numeric and decimal numbers in
MySQL. A scaled integer is stored as a binary value plus a scale. They are represented by
a type byte that indicates the type and the number of bytes required to represent the
value, followed by a byte containing the scale. This is the exception to the rule that
represents the number of bytes remaining in the field. In the case of scaled numbers, the
remaining bytes are <nn> plus one for the scale. A numeric field described as (18,
containing the value zero is represented in two bytes as edsScaledLen0, 4. The same
definition containing the value 100,000 is represented in five bytes as edsScaledLen3, 4,
1, 134, 160 [same notes as above].
If the scaled numeric value has a precision greater than 18 digits it is represented as
edsScaledCount1 followed by a byte of scale, a byte of length, followed by the bytes of
The encoding of floating point numbers is similar, though the truncation occurs at the end
of floating point numbers. All floating point is represented internally as double
Trailing zeros in the mantissa are eliminated. A floating point zero is represented as a
single byte of type edsDoubleLen0. The reader is invited to guess the encoding of
100,000 as a floating-point number. My guess is edsDoubleLen3, 64, 248, 106. [Same
notes as above]. The encoding of floating point numbers goes from edsDoubleLen0 to
Netfrastructure represents all character strings as UTF8. An empty character string is
encoded as a single byte of edsUtf8Len0. Character strings between 1 byte and 39 bytes
have distinct codes ranging from edsUtf8Len1 followed by one byte of data, to
edsUtf8Len39 followed by 39 bytes of data. Larger characters strings are represented by
a code followed by a number of bytes that describe the length of the string, followed by
the string. This paper, so far, would be a string of about 4666 bytes all in the
character representation of UTF8 - and would be encoded as edsUtf8Count2, 18, 58,
followed by 4666 bytes starting with 'F', 'a', 'l', 'c', 'o', 'n', ' r', 'e', 'c', 'o',
'r', 'd', 'e',
'n', 'c', 'o', 'd', 'i', 'n', 'g'
. [Notes; the bytes of the length may be in the wrong
and I may have miscalculated the byte value of 4,666 and the length may have changed.]
In the MySQL environment, Falcon handles character strings as opaque data, with an
encoding that is similar to the Netfrastructure character encoding. Opaque data with no
bytes is represented as edsOpaqueLen0. Four bytes of opaque data is represented as
edsOpaqueLen4 plus the four bytes. 4,666 bytes of opaque data is represented as
edsOpaqueCount2, 18, 58,
. [Notes as above.] Tiny blobs blobs of fewer than 256
bytes are stored as edsOpaqueCount1 if they are longer than 39 bytes, or as
edsOpaqueLen<number> for blobs between 0 and 39 bytes.
Blob and clob data are not encoded. Falcon stores blobs and clobs in a separate data
segment from the record data. Records contain the blob and clob numbers that Falcon
uses to locate their data. The major saving of encoding is the removal of unneeded space
from records. Blobs and clobs are stored at their exact length and do not suffer from the
problem of over definition in fixed length fields.
The lengths of blob and clob identifiers vary between zero and four bytes. A two-byte
blob identifier is encoded as edsBlobLen2 followed by the two bytes of the identifier.
Falcon converts MySQL date and time data types to its internal formats: Time,
Timestamp, and Date. Time is stored as the number of milliseconds since midnight, and
encoded as edsTimeLen0 (midnight) to edsTimeLen4. Date is stored as the number of
milliseconds since January 1, 1970 and encoded as edsMilliSec0 to edsMilliSec8.
Timestamp is the number of nanoseconds since January 1, 1970, encoded as
edsNanoSecLen0 to edsNanoSecLen8.