List:Falcon Storage Engine« Previous MessageNext Message »
From:Ann W. Harrison Date:November 11 2008 11:19pm
Subject:Re: Serial log record encoding
View as plain text  

    Now that you've solved that problem, here's the answer to the

<previous send was canceled because the attachment didn't please
Sun's mail system - so instead of carefully crafted formatting,
you get plain text.  bummer>



Falcon record encoding

The purpose of data compression in a database is to improve throughput by reducing the 
amount of I/O needed to read a set of records.   Serious compression algorithms can 
reduce the amount of data stored enormously – just zip a database file to see how much 
can be saved – but they do so at a cost of CPU that overwhelms the I/O savings.  
Simple run-length compression reduces the amount of data stored, cheaply, but when the 
amount of empty space is significant, it wastes considerable space. 

Much of the compressible space in a database occurs because fixed length fields are used 
to hold variable length data.  Fields must be declared large enough to hold the largest 
possible value.   Compression helps primarily by removing the trailing blanks in char and 
varchar fields.

Rather than compress data, Falcon uses a variable length data encoding for records stored 
in the memory-resident record cache and on disk.  The encoding depends on the stored 
values, not on the type declaration of the field.  The encoding eliminates excess space,
a slight increase in the cost of referencing individual fields.  

When records hold their fields in full storage format, the system can reference individual

fields by indexing directly into the record.  Any value-based compression or  encoding 
requires special handing to locate individual fields.  To reference field three, for
requires reading the encoding byte and possibly the length of fields one and two to locate

the code for field three. The in-memory record handling code keeps an index of the 
offsets of fields it has visited, so finding field thirty-six after finding field
does not require looking at fields one through thirty-three. 

Each field in a Falcon encoded record starts with a byte that describes the field, giving
possible 255 different field types, which is a lot, even considering the number of data 
types currently supported.  The types are declared in an enum (EncodedDataStream.h).

The enum names generally have the format eds<XXX><YYY><nn>. 
<XXX>is the data 
type.   <YYY> is either Len or Count.  If it is Len, then nn is generally the number
bytes that follow.  If <YYY> is Count, then nn is generally the number of count
that represent the length of the actual value.

Regardless of type, all null-valued fields are represented by a type byte that indicates
the field is null.  Nulls are edsNull, regardless of their declared type, making it the 
exception to the naming rule, since it has neither a <YYY> section nor an
<nn>.   Before 
switching from compression to record encoding, Falcon kept bits indicating which fields 
were null, effectively using one byte for every eight fields in the record.  Record 
encoding eliminates the need for null flag bits.

Falcon represents several MySQL data types as unscaled integer: tiny, short, int24, long, 
longlong, year, enum, set, and bit.  Numeric and decimal values less than 19 digits with a

scale of zero are also stored as unscaled integers.

Unscaled integers between –10 and 31 are represented by distinct types.  Regardless of 
the declaration of the field – tinyint, smallint, int, or bigint, if the value is between
and 31, the field is represented by a single byte.  For example, any integer field with a 
value of zero is represented by edsInt0, any integer field with the value 22 is
as edsInt22, and any integer field with the value –9 is represented by edsIntMinus9.

Larger integer values start with a single byte that indicates how many bytes are required 
to represent the value, and that number of bytes.  The range is from one byte (values from

–128 to 127, excluding –10 to 31) to eight bytes, including the rarely seen 3, 5, and 7
lengths.   For example, the value 100,000 is expressed in four bytes – one type byte and 
three data bytes as edsIntLen3, 1, 134, 160 [Note – the bytes may be in reverse order,
they are consistent and not dependent on machine architecture] [Second note, that 
conversion was done partly on my fingers and is not reliable.]  The representation is the 
same whether the field was declared as int24, long, or longlong.

For some ranges values of integer fields, the encoded data representation is longer than 
the actual field, but any integer whose value has a leading byte of zeros encodes to its 
storage length. Any integer whose value has two or more leading bytes of zeros encodes 
to less than its storage length.  

Falcon uses scaled integer values to represent scaled numeric and decimal numbers in 
MySQL. A scaled integer is stored as a binary value plus a scale. They are represented by 
a type byte that indicates the type and the number of bytes required to represent the
value, followed by a byte containing the scale.  This is the exception to the rule that
represents the number of bytes remaining in the field.  In the case of scaled numbers, the

remaining bytes are <nn> plus one for the scale.  A numeric field described as (18,
containing the value zero is represented in two bytes as edsScaledLen0, 4.  The same 
definition containing the value 100,000 is represented in five bytes as edsScaledLen3, 4, 
1, 134, 160 [same notes as above].  

If the scaled numeric value has a precision greater than 18 digits it is represented as 
edsScaledCount1 followed by a byte of scale, a byte of length, followed by the bytes of 

The encoding of floating point numbers is similar, though the truncation occurs at the end

of floating point numbers.  All floating point is represented internally as double
Trailing zeros in the mantissa are eliminated.  A floating point zero is represented as a 
single byte of type edsDoubleLen0.   The reader is invited to guess the encoding of 
100,000 as a floating-point number.   My guess is edsDoubleLen3, 64, 248, 106. [Same 
notes as above].   The encoding of floating point numbers goes from edsDoubleLen0 to 

Netfrastructure represents all character strings as UTF8.   An empty character string is 
encoded as a single byte of edsUtf8Len0.  Character strings between 1 byte and 39 bytes 
have distinct codes ranging from edsUtf8Len1 followed by one byte of data, to 
edsUtf8Len39 followed by 39 bytes of data.  Larger characters strings are represented by 
a code followed by a number of bytes that describe the length of the string, followed by 
the string.  This paper, so far, would be a string of about 4666 bytes – all in the
character representation of UTF8 - and would be encoded as edsUtf8Count2, 18, 58, 
followed by 4666 bytes starting with 'F', 'a', 'l', 'c', 'o', 'n', ' r', 'e', 'c', 'o',
'r', 'd', 'e', 
'n', 'c', 'o', 'd', 'i', 'n', 'g' ….  [Notes; the bytes of the length may be in the wrong
and I may have miscalculated the byte value of 4,666 and the length may have changed.]

In the MySQL environment, Falcon handles character strings as opaque data, with an 
encoding that is similar to the Netfrastructure character encoding.  Opaque data with no 
bytes is represented as edsOpaqueLen0.  Four bytes of opaque data is represented as 
edsOpaqueLen4 plus the four bytes.  4,666 bytes of opaque data is represented as 
edsOpaqueCount2, 18, 58,  …. [Notes as above.]   Tiny blobs – blobs of fewer than 256 
bytes  – are stored as edsOpaqueCount1 if they are longer than 39 bytes, or as 
edsOpaqueLen<number> for blobs between 0 and 39 bytes.

Blob and clob data are not encoded.   Falcon stores blobs and clobs in a separate data 
segment from the record data.  Records contain the blob and clob numbers that Falcon 
uses to locate their data.  The major saving of encoding is the removal of unneeded space 
from records.  Blobs and clobs are stored at their exact length and do not suffer from the

problem of over definition in fixed length fields.

The lengths of blob and clob identifiers vary between zero and four bytes.  A two-byte 
blob identifier is encoded as edsBlobLen2 followed by the two bytes of the identifier.

Falcon converts MySQL date and time data types to its internal formats: Time, 
Timestamp, and Date.  Time is stored as the number of milliseconds since midnight, and 
encoded as edsTimeLen0 (midnight) to edsTimeLen4.  Date is stored as the number of 
milliseconds since January 1, 1970 and encoded as edsMilliSec0 to edsMilliSec8.  
Timestamp is the number of nanoseconds since January 1, 1970, encoded as 
edsNanoSecLen0 to edsNanoSecLen8.

Serial log record encodingChristopher Powers8 Nov
  • Re: Serial log record encodingAnn W. Harrison12 Nov
    • Re: Serial log record encodingHakan Kuecuekyilmaz12 Nov
      • Re: Serial log record encodingAnn W. Harrison12 Nov
  • Re: Serial log record encodingAnn W. Harrison12 Nov