According to the blog link forwearded by Mark, Oracle customers don not
like that zero length strings (which ar equal to each other) are
automatically converted to NULLs. Both suggestions take care of that in
Falcon. So this is the most inportant thing; make a zero length string
equal to 0x00 length 1.
The question is whether to keep adding 0x00 to other lengths of binary
zero strings. Jim says it does not matter to anyone but QA that 0x00
and 0x0000 sort separately. And Vlad points out that even if we did
this for single field keys, it would not sort them differently for
multisegment key since we always pad them to a RUN length. I think that
if it does not cause any extra difficulties or comlexity in the code,
why not keep QA happy for single segment keys.
And I still am unclear why this little change in index order should
cause us to change the ODS format while still in the alpha stage. What
is the downside of a new engine that starts converting zero length
strings into 0x00? New entries will be added to the index after the
NULLS. Older zero length strings would be mixed up with the NULLS and
may not be found for direct searched until the index is rebuilt. We can
document that as a bug fix in the index, which it is. Nobodies critical
data is depending on us finding all zero length strings.
Lars-Erik Bjørk wrote:
> Ok, so we probably don't want to do the caching after all then? Does
> anyone else have an opinion on how to proceed on this? Do we agree on
> any best approach?
> Jim Starkey wrote:
>> Vladislav Vaintroub wrote:
>>> Hi Lars-Erik,
>>> I wonder if adding 0x00 to the (binary) string values that already start
>>> with 0x00 would not be less works that modifying index walker etc. This
>>> looks like huge amount of work you have done (good) but I wonder if
>>> there is
>>> a good reason for it. Assuming (binary) strings that start with 0x00 are
>>> really seldom, prepending 0x00 to a key after a check is not going to
>>> be an
>>> expensive operation. And that makes NULL *really* different from
>>> other index
>>> values. And that allows maybe in some distant future index-only
>>> access, so
>>> you can answer "is null/is not null" without extra accessing the
>>> record and
>>> this is a real performance advantage.
>> Why do you want to do that? Is the following sufficient:
>> 1. A null is represented as either a zero length key or a missing
>> segment in a multi-segment key. This collates lowest.
>> 2. A zero length binary key is represented by a single byte of zero.
>> 3. A binary key with a single zero byte is indistinquishable from a
>> zero length (but non-null) key
>> 4. A binary key with a leading zero byte and a subsequent non-zero
>> byte will collate about #2 and #3.
>> I don't think we really care about the ordering of a non-null, zero
>> length key and and all zero binary key. I don't think anyone else
>> should, either.