Part of the problem (at least where myisam is involved) is the way the
optimizer calculates the cost.
2 older patches to gain about 20% in the myisam key lookup are
http://bugs.mysql.com/bug.php?id=26232
http://bugs.mysql.com/bug.php?id=26453
This isn't enough of course. and I am looking fwrd to see how this may
be solved. (Also the above affects myisam only)
Another possibility within the myisam code would be in relation to
packed keys.
( but again, it only does a percentage speedup. The issue will still
exist, if the IN lists are big enough)
With packed keys each node has to be searched from its beginning, for a
matching entry/subnode. Since the optimizer does lookup the "in" values
in sorted order, it would be possible to remember the last position
(with value, so we can unpack subsequent values) in all visited nodes
(remember one pos/value pair for each node between the tree root and the
current entry).
1st lookup "A char value"
- topnode: found "A char string" at position 5 (position 6 starts with
"A char zzzz") (we remember position 6 with a value of "A char zzzz"
(and we remember the 2nd lvl node of pos 5)
- 2nd lvl node found"A char topic" at position 8 (we remember position 9
with a value of "A char www")
....
2nd lookup "a char xxx"
instead of scanning the nodes we start with the remembered result
- topnode: start search at position 5 (save the time to decode all the
previous values)
compare against the remembered value of position 6, still smaller go
to 2nd lvl node
- 2nd lvl node compare against the value at pos 9, => positive
continue the search from pos9 in this node
I agree this is quite a lot of work, and not very generic (only myisam),
but it does have a potential speedup.
-----
A better solution would be to implement a similar search, but with
condition push down.
myisam (or the storage engine) would work the index tree like this
(starting at the top level node)
- for each entry in current node
-- single value: is in condition ? add 1 to cost or skip
-- sub node: check if range of subnode overlaps with condition (eg
subnode starts at value 117 and ends at 193)
- no overlap: don't enter
- matched completely by a single range in condition: add estimated
size of subnode to cost
- matched partially, by one or more ranges: enter subnode for more
details
the push down condition could also take advantage of the fact that all
individual ranges are available in sorted order.
Sergey Petrunia wrote:
> Hi Jay,
>
> On 7/20/07, Jay Pipes <jay@stripped> wrote:
>> Sergey, would this be something that might be better as a server
>> variable, configurable at runtime? Seems like, given your patch below,
>> a community member might be able to put together a proper patch for
>> MySQL to have a configurable server setting to control this behaviour?
>>
>> What do you think? Are there difficulties I should be aware of?
>
> This patch is a rather crude fix targeted to solve one specific case.
> I don't think it should go into the server, nor in its current form
> neither as a server variable. If we start introducing such server
> variables, we will quickly end up with dozens of them, and it will be
> nearly impossible to tune them.
>
> The real fix is to make everything to work automatically. I have one
> similar bug in the works (BUG#20932), and as far as I am aware, we'll
> get a bug entry for Mark's case and will take time to find the root
> causes of the slowdown (there may be several) and create a proper fix.
> The proper fix will likely have little in common with the provided
> band-aid patch.
>
> BR
> Sergey
> --
> Sergey Petrunia, Software Developer
> MySQL AB, www.mysql.com
> Office: N/A
> Blog: http://s.petrunia.net/blog
>