List:Falcon Storage Engine« Previous MessageNext Message »
From:Ann W. Harrison Date:December 3 2008 11:11pm
Subject:trailing spaces and compound keys and tabs
View as plain text  
Falconers all,

    Falcon's indexes were designed to be as dense as possible - the
keys are variable length with both suffix and prefix compression on
all key values, including suffix compression on the leading segments
of compound keys.  Compressing the trailing blanks on compound keys
will lead to confusion unless some information is added to separate
the keys.

If these fields with these values form a compound key

Field1   Field2

a        bcdefgh
ab       cdefgh
abc      defgh
abcd     efgh
abcde    fgh
abcdef   gh
abcdefg  h

If you just strip the spaces and stick the two fields together, all
those values are equivalent.

What Falcon does is pad each leading field out with spaces to a
multiple of four bytes (I think, not characters, though that
should be checked and probably changed for UTF-16). It then inserts
a binary byte after each group of four characters.  The byte holds
the position of the field in the key.  So the value above would
become

Field1         Field2

a<0x20><0x20><0x20><0x0>bcde<0x1>fgh
ab<0x20><0x20><0x0>cdef<0x1>gh
abc<0x20><0x0>defg<0x1>h
abcd<0x0>efgh
abcd<0x0>e<0x20><0x20><0x20><0x0>fgh
abcd<0x0>ef<0x20><0x20><0x0>gh
abcd<0x0>efg<0x20><0x0>h

(Putting the field position byte every five bytes is arbitrary -
spacing them further apart reduces the key growth at a cost of
raising the minimum key size.)

The question that concerns me at the moment is whether introducing
tab <0x7> characters causes a problem with this trick.  I don't
think so, but ...  Here are the cases, with each line of the
values above preceded by one that has a <tab> as the last byte
of the first field.

values:

Field1        Field2

a<0x7>        bcdefgh
a             bcdefgh
ab<0x7>       cdefgh
ab            cdefgh
abc<0x7>      defgh
abc           defgh
abcd<0x7>     efgh
abcd          efgh
abcde<0x7>    fgh
abcde         fgh
abcdef<0x7>   gh
abcdef        gh
abcdefg<0x7>  h
abcdefg       h


encoding:

Field1         Field2

a<0x7><0x20><0x20><0x0>bcde<0x1>fgh
a<0x20><0x20><0x20><0x0>bcde<0x1>fgh
ab<0x7><0x20><0x0>cdef<0x1>gh
ab<0x20><0x20><0x0>cdef<0x1>gh
abc<0x7><0x0>defg<0x1>h
abc<0x20><0x0>defg<0x1>h
abcd<0x0><0x7><0x20><0x20><0x20><0x0>efgh
abcd<0x0>efgh
abcde<0x0><0x7><0x20><0x20><0x0>fgh
abcd<0x0>e<0x20><0x20><0x20><0x0>fgh
abcd<0x0>ef<0x7><0x20><0x0>gh
abcd<0x0>ef<0x20><0x20><0x0>gh
abcd<0x0>efg<0x7><0x0>h
abcd<0x0>efg<0x20><0x0>h


If anyone has followed this, do you agree that it seems to
get the right answer?

Best,

Ann
Thread
trailing spaces and compound keys and tabsAnn W. Harrison4 Dec
  • Re: trailing spaces and compound keys and tabsKevin Lewis4 Dec
    • Re: trailing spaces and compound keys and tabsAnn W. Harrison4 Dec
  • Re: trailing spaces and compound keys and tabsAnn W. Harrison5 Dec