List:General Discussion« Previous MessageNext Message »
From:Jigal van Hemert Date:June 14 2005 9:23am
Subject:Re: Slow query: optimizer ignores index, using filesort
View as plain text  
From: "Scott Gifford"
> >> Apparently MySQL's optimizer sees that it can use the primary key for
> >> mirealsource_home_supplemental to do the query, but for some reason
> >> decides not to.
> > This is often the case when the query will probably return more than 30%
> > the records in that table. In such cases it is more efficient to do a
> > table scan (which is indicated here by the 'ALL' type in the explain
> > output).
> Right, ALL would be a great plan if it weren't for the LIMIT 1.

The LIMIT 1 will be performed *after* the recordset is sorted :-(

> > Furthermore MySQL can only use an index for sorting if all columns in
> > ORDER BY clause are from the first table in the explain output that
> > have a 'const' join type. This is why setting the ORDER BY to
> > mirealsource_homes_supplemental.mls_num will remove the 'Using filesort'
> > result in faster sorting.
> I'm a little surprised MySQL can't figure out on its own that it can
> get the same effect by using mls_num from either table, since the
> tables are joined on it, so the values will always be identical.

You have two LEFT JOINs, so the values of mls_num might be something (the
identical value you refer to) or NULL. So, you expect MySQL to evaluate the
JOIN conditions, figure out that the ORDER BY column actually is the same as
columns from other tables and see if one of these columns is actually the
first non-const table in the execution path?
I agree with you that 'using filesort' should be considered in the
optimizer, but this requires quite a bit of knowledge on the part of the
optimizer. (E.g. how many more rows should I allow in the estimated
evaluation set in favour of sorting with an index? How should this relate to
the sizes of various buffers, etc.? What about different engines (MyISAM and
InnoDB store and handle indexes quite differently)?).

> > MySQL tries to optimize queries by (among others) guestimating which
> > will result in the smallest number of records. It appears that this path
> > with 100 * 8 * 8 * 8 (51200) records is the minimum size it can achieve.
> It looks to me like it's actually 100 * 1 * 1 * 1 = 100 (8 is the key
> length),

You're right, I looked in the wrong column.

> which is the same as what I get when I force a STRAIGHT_JOIN.
> So with two plans that will use the same number of records, I'm
> surprised MySQL doesn't choose the one that will allow it to use an
> index.
There is no limiting condition whatsoever in the WHERE clause (except the
JOIN condition), so it will try to estimate based on the cardinality and the
number of records in the table which excecution path will require it to
evaluate the smallest number of records. It will try to estimate whether the
use of an index is appropriate or not. But I can imagine that it will favour
a full table scan of a small table that will result in a small recordset,
especially when it knows that the other, bigger tables are JOINed with only
1 or a few records per join.

I do think that improvements on the optimizer should have a high priority
for the product, but I guess that features such as stored procedures,
triggers, MySQL Cluster will be more important for the larger, paying
customers. With a bigger share in the paying market there will be more
resources available to improve other parts of the product.

> I know there are a couple of tricks like that to fix this one query.
> What makes this hard is that that query is one of many that can be
> generated by a CGI-to-SQL search gateway.  If I put in a FORCE INDEX
> (mls_num) and the user searches by price, for example, query
> performance will be very bad.  In order to know the right indexes to
> force, as far as I can tell I'd have to implement my own optimizer,
> which seems somewhat excessive.

I'm about to do the same myself, although it's not a very high priority
project right now. We see performance differences in queries with InnoDB
tables which can be explained by variations in the cardinality estimates in
InnoDB resulting in different execution paths.
We will look into doing an EXPLAIN for certain generated queries and adding
some USE INDEX clues to make sure the right execution path is chosen.

This is part of a routine that generates many different queries based on a
wide variety of search forms, predefined selections, etc. Most of the time
MySQL does a great job with finding the 'right' execution path, but
sometimes it goes horribly wrong.

> > In this query you want the data where mls_num is as small as possible.
> > there a way you can limit the number of records by using an extra where
> > condition? This way you may change the order of the tables and make the
> > query faster.
> I tried that using mirealsource_homes.mls_num in the WHERE clause and
> it didn't make a difference.

I tried to say that you could/should try to limit the number of records that
are selected in the first place. E.g. we had a list of the ten most recently
subscribed members. This was queried by selecting the active members, sort
them on subscription date and limit it to ten. As more and more members are
added the resulting recordset increases and sorting becomes very slow. When
we were asked to optimize the query we figured out that each day a lot of
new members were added to the site, so we would select the active members
who joined in the last ten days, sort these and limit that to ten. This was
a lot faster! (Yes, we checked whether it would result in at least ten
records and if not, increase the number of days we would look into the past,
but this is not likely to happen.)

So, can you think of a limitation that would preselect a part of the
mirealsource_homes.mls_num table?

> Are there any other global things I can try, to tell MySQL to avoid
> table scans?  The queries almost always use at most LIMIT 10, so a
> table scan is generally not the right idea

A full table scan is the right idea as long as you select more than 30% of
the table, sort it and after that apply the limit 10 (which is the only
solution for such a query).

Regards, Jigal.

Slow query: optimizer ignores index, using filesortScott Gifford13 Jun
  • Re: Slow query: optimizer ignores index, using filesortJigal van Hemert13 Jun
    • Re: Slow query: optimizer ignores index, using filesortScott Gifford14 Jun
  • Re: Slow query: optimizer ignores index, using filesortJigal van Hemert14 Jun
    • Re: Slow query: optimizer ignores index, using filesortScott Gifford14 Jun
      • Re: Slow query: optimizer ignores index, using filesortScott Gifford14 Jun