From: Kevin Lewis Date: February 18 2009 2:50pm Subject: Re: Patch for bug#42208 List-Archive: http://lists.mysql.com/falcon/543 Message-Id: <499C202D.9010808@sun.com> MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Vlad, I guess I did not understand the full implementation you described below. Are you proposing a new way of storing multi-segmented keys? Is this new method still searchable and comparable byte-wise left to right without knowledge of the key type? Maybe you can explain this idea a little more thoroughly. Kevin Vladislav Vaintroub wrote: > Anyone wants to comment on that? > The change would be cheap and fixes at least current multisegment padding > problem, where 0x00==0x0000==0x000000... > >> -----Original Message----- >> From: Vladislav.Vaintroub@stripped [mailto:Vladislav.Vaintroub@stripped] >> On Behalf Of Vladislav Vaintroub >> Sent: Wednesday, February 18, 2009 12:45 AM >> To: Kevin.Lewis@stripped; 'Lars-Erik Bjørk' >> Cc: 'Jim Starkey'; 'FalconDev' >> Subject: RE: Patch for bug#42208 >> >> Guys, why we're still on alpha stage, maybe we could fix multisegments >> too? >> >> Many (i.e 7 or 8) years ago we used following schema for multisegment >> keys >> without padding >> >> Suppose we have keys A and B and want to make a multisegment key out of >> it. >> >> The resulting key would be >> f(A) 0x00 f(B) >> >> 0x00 serves as separator and f() is a transformation that converts >> >> 0x00=>0x01 0x00 >> 0x01=>0x01 0x01 >> >> Any other byte remains unchanged.0x00s at the end can be compressed, so >> we >> get efficient key is there are only/many NULLs. >> >> I do not think the schema is much more complicated than RUN length and >> padding. >> >> Vlad >> >> And while we're on it why not to fix integer representation;) Doubles >> are >> strange creation by mathematician, exact longlongs would be really >> nice, >> not? >> >> >> >> >>> -----Original Message----- >>> From: Kevin.Lewis@stripped [mailto:Kevin.Lewis@stripped] >>> Sent: Tuesday, February 17, 2009 11:55 PM >>> To: Lars-Erik Bjørk >>> Cc: Jim Starkey; Vladislav Vaintroub; 'FalconDev' >>> Subject: Re: Patch for bug#42208 >>> >>> According to the blog link forwearded by Mark, Oracle customers don >> not >>> like that zero length strings (which ar equal to each other) are >>> automatically converted to NULLs. Both suggestions take care of that >>> in >>> Falcon. So this is the most inportant thing; make a zero length >> string >>> equal to 0x00 length 1. >>> >>> The question is whether to keep adding 0x00 to other lengths of >> binary >>> zero strings. Jim says it does not matter to anyone but QA that 0x00 >>> and 0x0000 sort separately. And Vlad points out that even if we did >>> this for single field keys, it would not sort them differently for >>> multisegment key since we always pad them to a RUN length. I think >>> that >>> if it does not cause any extra difficulties or comlexity in the code, >>> why not keep QA happy for single segment keys. >>> >>> And I still am unclear why this little change in index order should >>> cause us to change the ODS format while still in the alpha stage. >> What >>> is the downside of a new engine that starts converting zero length >>> strings into 0x00? New entries will be added to the index after the >>> NULLS. Older zero length strings would be mixed up with the NULLS >> and >>> may not be found for direct searched until the index is rebuilt. We >>> can >>> document that as a bug fix in the index, which it is. Nobodies >> critical >>> data is depending on us finding all zero length strings. >>> >>> Kevin >>> >>> Lars-Erik Bjørk wrote: >>>> Ok, so we probably don't want to do the caching after all then? >> Does >>>> anyone else have an opinion on how to proceed on this? Do we agree >> on >>>> any best approach? >>>> >>>> /Lars-Erik >>>> >>>> Jim Starkey wrote: >>>>> Vladislav Vaintroub wrote: >>>>>> Hi Lars-Erik, >>>>>> I wonder if adding 0x00 to the (binary) string values that >> already >>> start >>>>>> with 0x00 would not be less works that modifying index walker >> etc. >>> This >>>>>> looks like huge amount of work you have done (good) but I wonder >> if >>>>>> there is >>>>>> a good reason for it. Assuming (binary) strings that start with >>> 0x00 are >>>>>> really seldom, prepending 0x00 to a key after a check is not >> going >>> to >>>>>> be an >>>>>> expensive operation. And that makes NULL *really* different from >>>>>> other index >>>>>> values. And that allows maybe in some distant future index-only >>>>>> access, so >>>>>> you can answer "is null/is not null" without extra accessing the >>>>>> record and >>>>>> this is a real performance advantage. >>>>>> >>>>>> >>>>> Why do you want to do that? Is the following sufficient: >>>>> >>>>> 1. A null is represented as either a zero length key or a >> missing >>>>> segment in a multi-segment key. This collates lowest. >>>>> 2. A zero length binary key is represented by a single byte of >>> zero. >>>>> 3. A binary key with a single zero byte is indistinquishable >> from >>> a >>>>> zero length (but non-null) key >>>>> 4. A binary key with a leading zero byte and a subsequent non- >> zero >>>>> byte will collate about #2 and #3. >>>>> >>>>> I don't think we really care about the ordering of a non-null, >> zero >>>>> length key and and all zero binary key. I don't think anyone else >>>>> should, either. >>>>> >>>> >> >> -- >> Falcon Storage Engine Mailing List >> For list archives: http://lists.mysql.com/falcon >> To unsubscribe: http://lists.mysql.com/falcon?unsub=wlad@stripped > > > > -- > Falcon Storage Engine Mailing List > For list archives: http://lists.mysql.com/falcon > To unsubscribe: http://lists.mysql.com/falcon?unsub=kevin.lewis@stripped > >