List:Falcon Storage Engine« Previous MessageNext Message »
From:Vladislav Vaintroub Date:February 18 2009 4:07pm
Subject:RE: Patch for bug#42208
View as plain text  
And here is an example where existing schema breaks proposed schema not.
PROPOSED encoding (works well)
('\0','a')     =>  1 0 0  a
('\0\0', 'a')  =>  1 0 1  0  0 a 

Existing schema would pad the value to multiple of RUN (6? 5?) bytes and
sorts things with different number or trailing NULL the same
EXISTING encoding (broken)
('\0','a')     => 2 0 0 0 0 0 1 a 
('\0\0', 'a')  => 2 0 0 0 0 0 1 a
 

> -----Original Message-----
> From: Vladislav.Vaintroub@stripped [mailto:Vladislav.Vaintroub@stripped]
> On Behalf Of Vladislav Vaintroub
> Sent: Wednesday, February 18, 2009 4:43 PM
> To: Kevin.Lewis@stripped
> Cc: 'Lars-Erik Bjørk'; 'Jim Starkey'; 'FalconDev'
> Subject: RE: Patch for bug#42208
> 
> 
> 
> > -----Original Message-----
> > From: Kevin.Lewis@stripped [mailto:Kevin.Lewis@stripped]
> > Sent: Wednesday, February 18, 2009 3:50 PM
> > To: Vladislav Vaintroub
> > Cc: 'Lars-Erik Bjørk'; 'Jim Starkey'; 'FalconDev'
> > Subject: Re: Patch for bug#42208
> >
> > Vlad,
> >
> > I guess I did not understand the full implementation you described
> > below.  Are you proposing a new way of storing  multi-segmented keys?
> > Is this new method still searchable and comparable byte-wise left to
> > right without knowledge of the key type?
> >
> > Maybe you can explain this idea a little more thoroughly.
> 
> Ok, I will try with an example.
> 
> Again, fields are A and B and we're constructing multisegment key
> 
> I will use (stringA, stringB) notation for tuple of A and B
> 
> Example 1:
> 
>  ('a','b')  would transform to   a 0 b
>  ('ab','c')                 to   a b 0 c
>  ('a', 'bc')                to   a 0 b c
> 
> Natural sorting order            ('a','b') < ('a', 'bc') < ('ab','c')
> corresponds to key sorting order  a 0 b    < a 0 b c     < a b 0 c
> 
> Example 2:
> 
> Now with special values 0 and 1 that translate to 0x0100 and 0x0101
> correspondingly
> 
> ('a\0b' , '\1') => a 1 0 b 0 1 1
> ('a', '\0b\1')  => a 0 1 0 b 1 1
> Also here the natural order is preserved.
> 
> Example 3:
> 
> ('a', NULL, 'b') => a 0 0 b   ( a  0<zero length key>0 b )
> ('a', 'b', NULL) => a 0 b 0   (last 0s can be stripped for compression)
> Also here natural order is preserved.
> 
> Also note, that it is simple  to parse key and restore individual
> values.
> Read byte-by-byte, skip leading 0x01 (it denotes 2 byte sequence) until
> you
> get 0.
> 
> Parsing first  part of  a 1 0 b 0 1 1
> Would be
> - Read 'a'
> - Skip 1
> - Read 0
> - Read 'b'
> - Stop (found 0)
> 
> 
> Please ask if you need more examples or I could not explain the
> encoding
> clearly.
> 
> 
> Vlad.
> 
> 
> 
> 
> 
> 
> 
> > Vladislav Vaintroub wrote:
> > > Anyone wants to comment on that?
> > > The change would be cheap and fixes at least current multisegment
> > padding
> > > problem, where 0x00==0x0000==0x000000...
> > >
> > >> -----Original Message-----
> > >> From: Vladislav.Vaintroub@stripped
> > [mailto:Vladislav.Vaintroub@stripped]
> > >> On Behalf Of Vladislav Vaintroub
> > >> Sent: Wednesday, February 18, 2009 12:45 AM
> > >> To: Kevin.Lewis@stripped; 'Lars-Erik Bjørk'
> > >> Cc: 'Jim Starkey'; 'FalconDev'
> > >> Subject: RE: Patch for bug#42208
> > >>
> > >> Guys, why we're still on alpha stage, maybe we could fix
> > multisegments
> > >> too?
> > >>
> > >> Many (i.e 7 or 8) years  ago we used following schema for
> > multisegment
> > >> keys
> > >> without padding
> > >>
> > >> Suppose we have keys A and B and want to make a multisegment key
> out
> > of
> > >> it.
> > >>
> > >> The resulting key would be
> > >> f(A) 0x00 f(B)
> > >>
> > >> 0x00 serves as separator and f() is a transformation that converts
> > >>
> > >>   0x00=>0x01 0x00
> > >>   0x01=>0x01 0x01
> > >>
> > >> Any other byte remains unchanged.0x00s at the end can be
> compressed,
> > so
> > >> we
> > >> get efficient key is there are only/many NULLs.
> > >>
> > >> I do not think the schema is much more complicated than RUN length
> > and
> > >> padding.
> > >>
> > >> Vlad
> > >>
> > >> And while we're on it why not to fix integer representation;)
> > Doubles
> > >> are
> > >> strange creation by mathematician, exact longlongs would be really
> > >> nice,
> > >> not?
> > >>
> > >>
> > >>
> > >>
> > >>> -----Original Message-----
> > >>> From: Kevin.Lewis@stripped [mailto:Kevin.Lewis@stripped]
> > >>> Sent: Tuesday, February 17, 2009 11:55 PM
> > >>> To: Lars-Erik Bjørk
> > >>> Cc: Jim Starkey; Vladislav Vaintroub; 'FalconDev'
> > >>> Subject: Re: Patch for bug#42208
> > >>>
> > >>> According to the blog link forwearded by Mark, Oracle customers
> don
> > >> not
> > >>> like that zero length strings (which ar equal to each other) are
> > >>> automatically converted to NULLs.  Both suggestions take care of
> > that
> > >>> in
> > >>> Falcon.  So this is the most inportant thing; make a zero length
> > >> string
> > >>> equal to 0x00 length 1.
> > >>>
> > >>> The question is whether to keep adding 0x00 to other lengths of
> > >> binary
> > >>> zero strings.  Jim says it does not matter to anyone but QA that
> > 0x00
> > >>> and 0x0000 sort separately.  And Vlad points out that even if we
> > did
> > >>> this for single field keys, it would not sort them differently
> for
> > >>> multisegment key since we always pad them to a RUN length.  I
> think
> > >>> that
> > >>> if it does not cause any extra difficulties or comlexity in the
> > code,
> > >>> why not keep QA happy for single segment keys.
> > >>>
> > >>> And I still am unclear why this little change in index order
> should
> > >>> cause us to change the ODS format while still in the alpha stage.
> > >> What
> > >>> is the downside of a new engine that starts converting zero
> length
> > >>> strings into 0x00?  New entries will be added to the index after
> > the
> > >>> NULLS.  Older zero length strings would be mixed up with the
> NULLS
> > >> and
> > >>> may not be found for direct searched until the index is rebuilt.
> > We
> > >>> can
> > >>> document that as a bug fix in the index, which it is. Nobodies
> > >> critical
> > >>> data is depending on us finding all zero length strings.
> > >>>
> > >>> Kevin
> > >>>
> > >>> Lars-Erik Bjørk wrote:
> > >>>> Ok, so we probably don't want to do the caching after all
> then?
> > >> Does
> > >>>> anyone else have an opinion on how to proceed on this? Do we
> agree
> > >> on
> > >>>> any best approach?
> > >>>>
> > >>>> /Lars-Erik
> > >>>>
> > >>>> Jim Starkey wrote:
> > >>>>> Vladislav Vaintroub wrote:
> > >>>>>> Hi Lars-Erik,
> > >>>>>> I wonder if adding 0x00 to the (binary) string values
> that
> > >> already
> > >>> start
> > >>>>>> with 0x00 would not be less works that modifying index
> walker
> > >> etc.
> > >>> This
> > >>>>>> looks like huge amount of work you have done (good) but
> I
> wonder
> > >> if
> > >>>>>> there is
> > >>>>>> a good reason for it. Assuming (binary) strings that
> start
> with
> > >>> 0x00 are
> > >>>>>> really seldom, prepending 0x00 to a key after a check
> is not
> > >> going
> > >>> to
> > >>>>>> be an
> > >>>>>> expensive operation. And that makes NULL *really*
> different
> from
> > >>>>>> other index
> > >>>>>> values. And that allows maybe in some distant future
> index-
> only
> > >>>>>> access, so
> > >>>>>> you can answer "is null/is not null" without extra
> accessing
> the
> > >>>>>> record and
> > >>>>>> this is a real performance advantage.
> > >>>>>>
> > >>>>>>
> > >>>>> Why do you want to do that?  Is the following sufficient:
> > >>>>>
> > >>>>>   1. A null is represented as either a zero length key or
> a
> > >> missing
> > >>>>>      segment in a multi-segment key.  This collates
> lowest.
> > >>>>>   2. A zero length binary key is represented by a single
> byte
> of
> > >>> zero.
> > >>>>>   3. A binary key with a single zero byte is
> indistinquishable
> > >> from
> > >>> a
> > >>>>>      zero length (but non-null) key
> > >>>>>   4. A binary key with a leading zero byte and a
> subsequent
> non-
> > >> zero
> > >>>>>      byte will collate about #2 and #3.
> > >>>>>
> > >>>>> I don't think we really care about the ordering of a
> non-null,
> > >> zero
> > >>>>> length key and and all zero binary key.  I don't think
> anyone
> > else
> > >>>>> should, either.
> > >>>>>
> > >>>>
> > >>
> > >> --
> > >> Falcon Storage Engine Mailing List
> > >> For list archives: http://lists.mysql.com/falcon
> > >> To unsubscribe:
> http://lists.mysql.com/falcon?unsub=1
> > >
> > >
> > >
> > > --
> > > Falcon Storage Engine Mailing List
> > > For list archives: http://lists.mysql.com/falcon
> > > To unsubscribe:
> > http://lists.mysql.com/falcon?unsub=1
> > >
> > >
> 
> 
> --
> Falcon Storage Engine Mailing List
> For list archives: http://lists.mysql.com/falcon
> To unsubscribe:    http://lists.mysql.com/falcon?unsub=1


Thread
Patch for bug#42208Lars-Erik Bjørk16 Feb
  • RE: Patch for bug#42208Vladislav Vaintroub16 Feb
    • RE: Patch for bug#42208Vladislav Vaintroub16 Feb
      • Re: Patch for bug#42208Jim Starkey16 Feb
        • RE: Patch for bug#42208Vladislav Vaintroub16 Feb
    • Re: Patch for bug#42208Jim Starkey16 Feb
      • RE: Patch for bug#42208Vladislav Vaintroub16 Feb
      • Re: Patch for bug#42208Lars-Erik Bjørk17 Feb
        • Re: Patch for bug#42208Kevin Lewis17 Feb
          • Re: Patch for bug#42208Ann W. Harrison18 Feb
            • Re: Patch for bug#42208Ann W. Harrison18 Feb
              • Re: Patch for bug#42208Ann W. Harrison18 Feb
                • Re: Patch for bug#42208Kevin Lewis18 Feb
                  • Re: Patch for bug#42208Ann W. Harrison18 Feb
                    • Re: Patch for bug#42208Kevin Lewis18 Feb
          • RE: Patch for bug#42208Vladislav Vaintroub18 Feb
            • RE: Patch for bug#42208Vladislav Vaintroub18 Feb
              • Re: Patch for bug#42208Kevin Lewis18 Feb
                • RE: Patch for bug#42208Vladislav Vaintroub18 Feb
                  • RE: Patch for bug#42208Vladislav Vaintroub18 Feb
              • Re: Patch for bug#42208Jim Starkey18 Feb
                • RE: Patch for bug#42208Vladislav Vaintroub18 Feb
                  • Re: Patch for bug#42208Jim Starkey18 Feb
                    • RE: Patch for bug#42208Vladislav Vaintroub18 Feb
                      • RE: Patch for bug#42208Vladislav Vaintroub18 Feb
            • Re: Patch for bug#42208Ann W. Harrison18 Feb
              • RE: Patch for bug#42208Vladislav Vaintroub18 Feb
                • Re: Patch for bug#42208Jim Starkey18 Feb
                  • Re: Patch for bug#42208Ann W. Harrison18 Feb
                    • Re: Patch for bug#42208Ann W. Harrison18 Feb
                • Re: Patch for bug#42208Ann W. Harrison18 Feb
                  • RE: Patch for bug#42208Vladislav Vaintroub18 Feb
                    • Re: Patch for bug#42208Ann W. Harrison18 Feb
                    • Re: Patch for bug#42208Kevin Lewis18 Feb
                      • Re: Patch for bug#42208Ann W. Harrison18 Feb
      • Re: Patch for bug#42208MARK CALLAGHAN17 Feb
  • Re: Patch for bug#42208Jim Starkey16 Feb