From: Ann W. Harrison Date: December 3 2008 11:11pm Subject: trailing spaces and compound keys and tabs List-Archive: http://lists.mysql.com/falcon/244 Message-Id: <4937123D.1010300@mysql.com> MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=ISO-8859-1 Content-Transfer-Encoding: 7BIT Falconers all, Falcon's indexes were designed to be as dense as possible - the keys are variable length with both suffix and prefix compression on all key values, including suffix compression on the leading segments of compound keys. Compressing the trailing blanks on compound keys will lead to confusion unless some information is added to separate the keys. If these fields with these values form a compound key Field1 Field2 a bcdefgh ab cdefgh abc defgh abcd efgh abcde fgh abcdef gh abcdefg h If you just strip the spaces and stick the two fields together, all those values are equivalent. What Falcon does is pad each leading field out with spaces to a multiple of four bytes (I think, not characters, though that should be checked and probably changed for UTF-16). It then inserts a binary byte after each group of four characters. The byte holds the position of the field in the key. So the value above would become Field1 Field2 a<0x20><0x20><0x20><0x0>bcde<0x1>fgh ab<0x20><0x20><0x0>cdef<0x1>gh abc<0x20><0x0>defg<0x1>h abcd<0x0>efgh abcd<0x0>e<0x20><0x20><0x20><0x0>fgh abcd<0x0>ef<0x20><0x20><0x0>gh abcd<0x0>efg<0x20><0x0>h (Putting the field position byte every five bytes is arbitrary - spacing them further apart reduces the key growth at a cost of raising the minimum key size.) The question that concerns me at the moment is whether introducing tab <0x7> characters causes a problem with this trick. I don't think so, but ... Here are the cases, with each line of the values above preceded by one that has a as the last byte of the first field. values: Field1 Field2 a<0x7> bcdefgh a bcdefgh ab<0x7> cdefgh ab cdefgh abc<0x7> defgh abc defgh abcd<0x7> efgh abcd efgh abcde<0x7> fgh abcde fgh abcdef<0x7> gh abcdef gh abcdefg<0x7> h abcdefg h encoding: Field1 Field2 a<0x7><0x20><0x20><0x0>bcde<0x1>fgh a<0x20><0x20><0x20><0x0>bcde<0x1>fgh ab<0x7><0x20><0x0>cdef<0x1>gh ab<0x20><0x20><0x0>cdef<0x1>gh abc<0x7><0x0>defg<0x1>h abc<0x20><0x0>defg<0x1>h abcd<0x0><0x7><0x20><0x20><0x20><0x0>efgh abcd<0x0>efgh abcde<0x0><0x7><0x20><0x20><0x0>fgh abcd<0x0>e<0x20><0x20><0x20><0x0>fgh abcd<0x0>ef<0x7><0x20><0x0>gh abcd<0x0>ef<0x20><0x20><0x0>gh abcd<0x0>efg<0x7><0x0>h abcd<0x0>efg<0x20><0x0>h If anyone has followed this, do you agree that it seems to get the right answer? Best, Ann