> Lars-Erik Bjørk wrote:
> Hi all!
>
> This week I have been looking at http://bugs.mysql.com/bug.php?id=42208
> (Falcon's ORDER BY .. LIMIT gives wrong/inconsistent results on NULL
> values).
>
> The problem is that NULL values sort together with numeric zero. I have
> talked this issue over with Vlad, and have tried several different
> solutions to make it work for both single-field and multi-field indexes.
> What seems to work, is for Index::makeKey(Field *field, Value *value,
> int segment, IndexKey *indexKey) to return a zero length key for NULL
> values, and to prepend 0x00 to all keys starting with 0x00 (to make NULL
> sort before the empty (string), and to make the empty string, now 0x00,
> sort before the previous 0x00, now 0x0000, etc, etc)
This seems to work for single segment keys. These extended key values
would be written into the index page, so we can add a new
ODS_MINOR_VERSION4 and change;
- #define ODS_MINOR_VERSION ODS_MINOR_VERSION3
+ #define ODS_MINOR_VERSION ODS_MINOR_VERSION4
Then in Dbb::open(), this code would prevent this new engine from
accessing any previously created tablespace.
if (header.odsVersion > ODS_VERSION ||
(header.odsVersion == ODS_VERSION && header.odsMinorVersion >
ODS_MINOR_VERSION))
throw SQLError (VERSION_ERROR, "Falcon on disk structure version %d.%d
is not supported by version %d.%d server",
header.odsVersion, header.odsMinorVersion, ODS_VERSION,
ODS_MINOR_VERSION);
But the solution above does not address how null values are stored and
compared in multiu-segmented keys. The early segments are all padded
out to the nearest RUN length so that NULLs and zero-length values look
like 0x00, which also look like 0x0000 and 0x000000 and 0x00000000 and
0x0000000000. They all sort together. There is nothing in the sorted
RUN-length encoding which indicates the length of an early segment, is
there? This is an encoding which is sortable byte-by-byte while being
agnostic of data type.
Someone said last week that there is a limit in MySQL of 32 key
segments. Our 1 byte 'segment number' at the start of each RUN is a
one-base counter. Maybe there is a way to combine 5 bits for
segmentNumber (0 - 31) with 3 bits of actual field length within the RUN...
> Being in this area, I picked up http://bugs.mysql.com/bug.php?id=23692
> again, which we have discussed on a previous meeting. The idea Kevin
> proposed here (if I remember correctly) was to append 0x20 to all
> upper-bound search keys ending in >= 0x20 (Kevin, please correct me if
> my memory is corrupt). This means passing an extra argument all the way
> from StorageTable::setIndexBounds down to Index::makeKey, because we
> don't want to store the extra byte in the regular index keys.
>
> How does the rest of you feel about these solutions? I get this hacky
> feeling:) We have to make sure there is always room for a 0x00 and a
> 0x20 in the key, meaning that the actual key can no longer be
> MAX_PHYSICAL_KEY_LENGTH long, but two less, as well as [prep,app]ending
> 0x00 and 0x20 being pretty magic.
>
> Is this something that you are comfortable with? I don't have any
> alternative solutions:)
We do not even have to change the ODS version to add an extra 0x20 to
all string fields stored. This would give the correct sorting, if the
appending of the 0x20 is done for every string field that goes through
makeKey. But it would also create larger indexes since Falcon uses
prefix compression only, not postfix compression (or whatever it would
be called).
I think the idea that we are ruminating about is whether we can get away
with adding the extra 0x20 only at comparison time, or immediately
before that. To do this, we need to know the field type. But any
search value or index value that comes out of makeKey is type agnostic.
It would be a big and difficult change, in my mind, if we tried to
carry the field type metadata past makeKey into the world of falcon key
comparison and index storage!
I do not know any other alternative to actually storing the extra 0x20
in the index page. A search key comes in from the server, goes through
makeKey, and then is ready to compare to the contents of an index page
or deferredIndex. And it has now become field-type agnostic.
Kevin