From: Kevin Lewis Date: December 3 2008 6:07pm Subject: Re: Please review fox for bug#34479 List-Archive: http://lists.mysql.com/falcon/239 Message-Id: <4936CAE1.5090109@sun.com> MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=UTF-8 Content-Transfer-Encoding: 8BIT Expanding on what Ann wrote: Ann W. Harrison wrote: > Expanding on what Kevin Lewis wrote: > >> >> Bar wrote; >>>> Sorry, I'm not strong with Falcon internals, >>>> so I don't know why you need to trim trailing minSortChar. >>>> This makes MySQLCollation::compare() work differently from >>>> how collation really works. >>>> >>>> Can you please give some insight for this? >> >> > Lars-Erik Bjørk wrote: >>> I think I will pass that one on to somebody else:) Maybe you could >>> explain this briefly, Kevin? >> >> The Falcon internal encoded record does not store trailing white >> space. Jim Starkey has declared many times that he is on a mission to >> replace the use of char[anything], varchar[anything], etc with just >> 'string'. Falcon does that internally. I also see no reason to store >> what does not matter. > > To clarify slightly, Falcon removes trailing spaces from strings, it > does not remove trailing tab characters which are often called "white > space". > > One problem with removing trailing spaces is that some values that > can appear in strings sort lower than the space character, and the SQL > standard says that in a string comparison the shorter string is to be > padded with spaces to the length of the longer string. As long > as all strings are space padded to their full length, that doesn't > matter - the comparisons work naturally. > > However, when comparing strings of different lengths, the right > answer is less obvious. Logically it would seem that a two character > string sorts lower than any three character string that starts with > the same two characters. Logic, SQL, and mother of all character > sets, ASCII, aren't a good combination. > > The correct order of these strings (in most collations) is... > > ab<0x0> > ab > ab > aba > > For a long time, Falcon got that wrong in indexes and considered > 'ab' to be less than 'ab<0x0>'. Recently, Kevin had a brilliant > idea. When handling a string key value, both for storage and for > comparison, add an imaginary space to it. There's no reason to > store the space as long as everyone behaves as if it were stored. > > So now the values above are treated as if they were: > > ab<0x0><0x20> > ab<0x20> > ab<0x20> > aba<0x20> > > which, however ugly, is at least clear. Well this is not quite yet implemented. The bug is being worked on by Lars-Erik and does not have a BETA tag. I think it is; Bug #23692 Falcon: searches fail if data is 0x00