List:Internals« Previous MessageNext Message »
From:Michael Widenius Date:November 1 2001 9:12pm
Subject:Re: Possible bug in self-join order optimization
View as plain text  
Hi!

>>>>> "Eric" == Eric  <ej@stripped> writes:

Eric> Sorry for my delayed response...have been communicating with the
Eric> customer.  My situation has changed.

Eric> First let me acknowledge that creating the multi-part indexes did
Eric> speed up the queries I was running signifigantly.  However, it did not
Eric> help MySQL get the join order correct.  This is not a suprise to me,
Eric> as the only way MySQL could get this join right is to use metadata
Eric> based on the range distribution as you have described (and I have been
Eric> considering).

Good that things got better;  I agree that having some more meta-data
would make MySQL even better for cases like yours.

Eric> The large change in my situation is that I've discovered that our
Eric> customer is indexing every column of the table individually, except
Eric> type.  I believe I will suggest that they add the type column to their
Eric> index of the path column, but beyond that I don't believe there is any
Eric> further improvement I can make via indexes.  There won't be a
Eric> signifigant speed improvement by combining any of their indexes into
Eric> multi-part indexes, correct?

It depends on the queries.  If they are just doing one type of
queries, then there is probably not a big seed increase by combining
indexes.

Eric> So, the problem remains that even though I have every column indexed,
Eric> the estimates that MySQL does do not take into account the
Eric> distribution of values in those indexes and thus it does not optimize
Eric> the join order as I would like it to.  Since I'm using the same keys
Eric> for each join, the joins end up in the order that I put them in the
Eric> SQL statement.

This shouldn't have to happen if you the WHERE part is different for
the involved tables.  For example:

SELECT * from table as t1, table as t2 WHERE t1.id=t2.id and
t1.key=constant and t2.key=constant2;

In the above case MySQL can use the key distribution for 'key' to
decide which table it's better to first use in the join.

Eric> So is the correct thing to do still to modify OPTIMIZE and ANALYZE to
Eric> store distribution statistics and then modify get_quick_record_count()
Eric> to look at those for indexed (and non-indexed if it becomes necessary
Eric> / is easy) columns?  Where are the OPTIMIZE and ANALYZE commands?  I
Eric> would start this by simply storing it in memory after running OPTIMIZE
Eric> or ANALYZE and in the future I will add it to the MyISAM table format.

For MyISAM tables, you can find all necessary interace information in
the sql/ha_myisam.h and sql/ha_myisam.cc files.

I suggest you start by checking the code:

int ha_myisam::analyze(THD *thd, HA_CHECK_OPT* check_opt) 

and the function myisam/mi_check.c

If you can get this to work by storing the keys in memory, we can help
you to store these on disk for permanent usage.

Regards,
Monty
Thread