From: Ann W. Harrison Date: December 3 2008 5:45pm Subject: Re: Please review fox for bug#34479 List-Archive: http://lists.mysql.com/falcon/240 Message-Id: <4936C5AE.2080606@mysql.com> MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=UTF-8 Content-Transfer-Encoding: 8BIT Expanding on what Kevin Lewis wrote: > >> Bar wrote; >>> Sorry, I'm not strong with Falcon internals, >>> so I don't know why you need to trim trailing minSortChar. >>> This makes MySQLCollation::compare() work differently from >>> how collation really works. >>> >>> Can you please give some insight for this? > > > Lars-Erik Bjørk wrote: >> I think I will pass that one on to somebody else:) Maybe you could >> explain this briefly, Kevin? > > The Falcon internal encoded record does not store trailing white space. > Jim Starkey has declared many times that he is on a mission to replace > the use of char[anything], varchar[anything], etc with just 'string'. > Falcon does that internally. I also see no reason to store what does > not matter. To clarify slightly, Falcon removes trailing spaces from strings, it does not remove trailing tab characters which are often called "white space". One problem with removing trailing spaces is that some values that can appear in strings sort lower than the space character, and the SQL standard says that in a string comparison the shorter string is to be padded with spaces to the length of the longer string. As long as all strings are space padded to their full length, that doesn't matter - the comparisons work naturally. However, when comparing strings of different lengths, the right answer is less obvious. Logically it would seem that a two character string sorts lower than any three character string that starts with the same two characters. Logic, SQL, and mother of all character sets, ASCII, aren't a good combination. The correct order of these strings (in most collations) is... ab<0x0> ab ab aba For a long time, Falcon got that wrong in indexes and considered 'ab' to be less than 'ab<0x0>'. Recently, Kevin had a brilliant idea. When handling a string key value, both for storage and for comparison, add an imaginary space to it. There's no reason to store the space as long as everyone behaves as if it were stored. So now the values above are treated as if they were: ab<0x0><0x20> ab<0x20> ab<0x20> aba<0x20> which, however ugly, is at least clear. Best, Ann