List: Falcon Storage Engine « Previous MessageNext Message » From: Kevin Lewis Date: October 21 2008 2:56pm Subject: bug#23692 Solutions View as plain text
>Lars-Erik wrote;
> ---------------------------------------------------------
> 2. What do you plan to accomplish by next time we meet?
> ---------------------------------------------------------
> * Continue to work on, and hopefully have a fix for bug#34479 and bug#23692
>     - This includes chatting/mailing with Kevin about the possible solutions

Lars-Erik, Jim, Ann,

Last Friday in our meeting, Jim mentioned that we should compare char
and varchar key values as if they have an imaginary trailing blank after
them.  This will work only if it happens when adding keys to the index
in addition to searching the index.  Also, it must happen when sorting
and searching in DeferredIndexes.

For example, if these values are found in the index in the order of what
is actually stored, they would  be (and currently are) like this;

notation  [this means the character is trimmed for the index]

0x 43
0x 43 [20]
0x 43 09
0x 43 43

Consider;  SELECT hex(a) FROM t1 WHERE a > 0x43;

If we implicitly add a 0x20 after the search key and the indexed key,
then we might find the 0x43 values, but the next key value will be 43
09, which is less than the current position.  In other words, it will
seem like the index is out of order.

But if we sort the keysin the index as if they contained that trailing
space, then they would look like this;

notation  {this means the character is assumed but missing}

0x 43 09
0x 43 {20}
0x 43 {[20]}
0x 43 43

Now the index keys are in SQL order and can correctly return both
SELECT hex(a) FROM t1 WHERE a > 0x43;
and   SELECT hex(a) FROM t1 WHERE a < 0x43;

In Falcon, key comparison is done on a byte by byte manner independent
of what type of key it is.  That is because all key values, whether
number or string, are converted into a form that can be compared
byte-wise left to right.  This is a great advantage when comparing
multi-column indexes of various types because the code is much simpler.
The strict field type comparisons happen later when we are searching
each record.  The solution Jim suggested Friday would necessitate that
we determine where in a multicolumn key a char or varchar field starts
and ends in order to know where to implicitly add a space character.
Unless someone can correct my thinking here, I do not think that implied
but not stored spaces in indexes is going to work for us.

The only solution to this dilema is to actually add the trailing space
character to the index.

notation  (this means the character is added for correct sorting)

0x 43 09
0x 43 (20)
0x 43 20
0x 43 43

Note that the trainiling space is added only to the index values.  Not
the record.

If we do this, I think it must be done as a selectable feature that can
be turned off for 'better' performance.  And we need to see what that
performance difference is.

Kevin