List: | Internals | « Previous MessageNext Message » | |

From: | Sergey Petrunya | Date: | July 15 2009 10:20am |

Subject: | Re: help with index_merge and clustering keys | ||

View as plain text |

On Sun, Jul 12, 2009 at 01:01:14PM -0400, Zardosht Kasheff wrote: > ah, yes, I made the mistake of making my example have equality checks > instead of ranges. > > So instead of "= 1", suppose each of the range checks are "> 1" (or > > 100000, some value where the optimizer needs records_in_range to > determine the cost estimate of using a key). > > So for the simple example Rich mentioned, "select * from foo where a > > 1 and b > 1" > The more complex example, "select * from foo where a > 1 and b > 1 > and c > 1 and d > 1" > > What happens here? Ok, here it goes: The whole process is in opt_range.cc, SQL_SELECT::test_quick_select() function. get_mm_tree() call analyzes the WHERE condition and constructs possible range/index_merge accesses. Then get_key_scans_params() is called, which uses records_in_range() calls to get #records estimates and costs for range accesses (i.e. accesses using individual indexes), and picks the best range access (this is fairly easy as the number of options is limited by the number of available indexes). Then it starts looking for best ROR[*]-intersection. This is more complicated as N ranges over different indexes give 2^N-N possible ways to intersect it. This is done in two steps * get_best_ror_intersect() searches for best ROR-intersection Then get_best_ror_intersect() is called which tries to construct a ROR-intersection. The idea is that we available range scans by their desirability: ranges[]= {range_scan(key1), range_scan(key2), range_scan(key3) } and then consider doing intersect on ranges[0], ranges[1] ranges[0], ranges[1], ranges[2] ranges[0], ranges[1], ranges[2], ranges[3] ... and so forth and pick the best intersection. After that, get_best_covering_ror_intersect() is called which uses a similar approach but only considers covering intersections (i.e. those where one won't have to call rnd_pos() to retrieve the full table record). This algorithm works, albeit not fully, it is known to make poor choices in certain situations. Hope this clarifies things a bit. [*] ROR=Rowid Ordered Retrieval. At the moment we only support intersection of ordered streams, more exactly rowid-ordered. index scans are expected produce records in rowid order when: * it is a scan on clustered primary key * it is a scan on "keyXpart1=const AND .. keyXpartN=const", where N is the last [visible] part. BR Sergey -- Sergey Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog