List:Internals« Previous MessageNext Message »
From:Timothy P Clark Date:May 20 2009 7:50pm
Subject:Re: character fields in records_in_range
View as plain text  
Hi Sergei,

Thanks for responding!

Sergei Golubchik <serg@stripped> wrote on 05/19/2009 09:56:21 AM:

> Hi, Timothy!
> 
> On May 18, Timothy P Clark wrote:
> > Sergei Golubchik <serg@stripped> wrote on 05/18/2009 04:34:35 AM:
> > > On May 15, Timothy P Clark wrote:
> > > > 
> > > > We're trying to figure out the best way to determine a key field's
> > > > non-padded length in records_in_range().
> > > > 
> > > > Our storage engine (IBM DB2) stores key values in a format that is
> > > > different from MySQL's. One consequence of this is that we need to
> > > > be able to determine in handler::records_in_range() the true
> > > > length of a padded character field in a key.
> > > 
> > > Why do you need to do that ?
> > 
> > The basic answer is because this is how DB2 requires us to construct 
the 
> > structure that we pass to get the estimate required by 
records_in_range.
> 
> I don't understand that. To get the estimates for records_in_range you
> need to pass in a range. Passing in a truncated value will generally
> result in the incorrect range, won't it ?
In the response I just sent to Monty, I explained that the truncated value 
is exactly what DB2 wants. Because the key length is shorter than the key 
itself, DB2 infers that we want a wildcard estimate equivalent to the LIKE 
clause.

> 
> > The pad values look like invalid data to the DB2 interface, so we have 
to 
> > pass the length of the valid data.
> > Furthermore, since the storage engine often performs character 
conversion 
> > (ASCII<->EBCDIC or UTF8<->UCS2 or Multi-byte<->UCS2) between
> MySQL and 

> > DB2, the pad character may end up with an entirely different codepoint 

> > when passed to DB2, thus losing the min/max value property that has 
within 
> > MySQL. So even if DB2 accepted the pad character, there is no 
guarantee 
> > that it would have the same semantic as is intended for 
records_in_range.
> 
> If the collation in DB2 significantly different from what MySQL assumes
> (which the the only explanation I can come up with for "invalid data"
> and "losing the min/max value property"), then I suspect that
> records_in_range() for LIKE is the least of your problems :)
OK, I probably overstated this. For valid characters, DB2 will collate the 
data correctly. The problem lies more with the invalid data. For example, 
0xFF or 0xFFFF is not a valid character in cp932. If we run that through 
iconv on the way to sending it to DB2, iconv is unhappy. We need to find 
another way to construct the range for DB2.

> 
> You should have problems with ORDER BY and GROUP BY, unless you pretend
> that your indexes are not ordered. You'll be able to do no ranges at
> all.  Even equalities aren't guaranteed to work, I'm afraid.
> 
> Do ORDER BY and, say, BETWEEN ranges work ? If yes - LIKE should work
> automatically. If they are not, I think that fixing LIKE is wrong, it's
> just patching up the sympthoms.
Yes, these do work, but that is because the actual I/O operations (i.e. 
index_read) run through a much different set of interfaces than 
records_in_range. Thus queries always give the right results but not 
necessarily good estimates. DB2 wasn't really designed with the MySQL 
storage engine API in mind. :-)

Thanks,
Tim
Thread
character fields in records_in_rangeTimothy P Clark15 May
  • Re: character fields in records_in_rangeSergei Golubchik18 May
    • Re: character fields in records_in_rangeTimothy P Clark18 May
      • Re: character fields in records_in_rangeSergei Golubchik19 May
        • Re: character fields in records_in_rangeTimothy P Clark20 May
          • Re: character fields in records_in_rangeSergei Golubchik20 May
            • Re: character fields in records_in_rangeTimothy P Clark21 May
  • re: character fields in records_in_rangeMichael Widenius19 May
    • re: character fields in records_in_rangeTimothy P Clark20 May
      • Re: re: character fields in records_in_rangeSergei Golubchik20 May
        • Re: re: character fields in records_in_rangeTimothy P Clark20 May
          • Re: re: character fields in records_in_rangeSergei Golubchik21 May
            • Re: re: character fields in records_in_rangeTimothy P Clark21 May