List:Internals« Previous MessageNext Message »
From:Michael Widenius Date:October 26 2001 3:30pm
Subject:Re: Possible bug in self-join order optimization
View as plain text  

I am just now on the Linux Lunacy conference which takes place on a
Cruise ship in the Caribbean which doesn't have an Internet
connection; Until the conference is over, my email responses will be a
bit delayed.

>>>>> "Eric" == Eric  <ej@stripped> writes:

Eric> Well, I would definitely have to do the count for each query; not
Eric> because my table sizes are changing (although they are at a fairly
Eric> rapid rate), but because the number of rows I want to select is vastly
Eric> different between queries.  This is actually a self-join (refer to
Eric> first emails from me to this list a few days ago), so I can't just
Eric> count how many rows are in each table, because there is only one.  The
Eric> problem is that each self-join in my query has widely varying number
Eric> of rows it returns based on what range I specify for the nvalue
Eric> column.

I agree.

Eric> So I'm not really sure what to do...Wouldn't doing a COUNT on each
Eric> self-join involved table take almost as much time as running the query
Eric> on each table?

Yes; This is not a good idea to do.

Eric> I wouldn't be opposed to implementing this as a part of the join
Eric> optimizer in MySQL, in fact, I've been reading through it for a few
Eric> days now...  However, it seems like it would be a large project as the
Eric> join optimizer does not take WHERE conditions on the joins into
Eric> account at all when estimating number of rows coming from a table.  In
Eric> addition, I would probably need to start storing some more metadata in
Eric> order to facillitate the kind of optimization I need...  has anyone
Eric> thought about doing this?

In we do an estimation of the number of rows that an
index covers.

It would not be that hard to do something similar for 'any' column,
just to know the number of rows the WHERE will cover.

The way to do this would be to extend:
'get_quick_record_count()' to also take into account not indexed

The best way to get the meta data is to add gathering this to the
OPTIMIZE and ANALYZE commands.

There is some ways to store the meta data:

- Store them in memory after an analyze or optimize. (Very easy but
  not that convenient).
- Store them in the table handler and add an interface in the handler
  to retrieve them. (Easy)
- Store them in the .frm file
  (Will be easer to do in MySQL 4.0.2 (?) when we change the .frm format)
  (Medium hard)
- Store them in memory based on run-time data from previous queries.

For the moment I would suggest you add this to the meta-data we
already have in MyISAM.

The big question is what kind of data to store.

I think that storing N (10 ?) ranges for each column that tells us how
many % of the rows are in each range should be sufficient to get a
good indication of how many rows the WHERE will accept.

Erik, did you already try changing your indexes, as I suggested in an
earlier email?


For technical support contracts, goto
   __  ___     ___ ____  __
  /  |/  /_ __/ __/ __ \/ /    Mr. Michael Widenius <monty@stripped>
 / /|_/ / // /\ \/ /_/ / /__   MySQL AB, CTO
/_/  /_/\_, /___/\___\_\___/   Helsinki, Finland