List:General Discussion« Previous MessageNext Message »
From:Joerg Bruehe Date:May 27 2010 8:30am
Subject:Re: Slow query using string functions
View as plain text  

Jerry Schwartz wrote:
> I have a pretty simple query that seems to take a lot longer than it ought to 
> (over 2 minutes).
> [[...]]
> 	feed_new.new_title AS `New Title FROM Feed`,
> 	prod.prod_pub_prod_id AS `Lib Code FROM DB`,
> 	prod.prod_title AS `Title FROM DB`,
> 	prod.prod_num AS `Prod Num`,
> 	prod.prod_published AS `Published FROM DB`
> FROM feed_new JOIN prod
> ON LEFT(feed_new.new_title, LENGTH(feed_new.new_title) - 5) = 
> LEFT(prod.prod_title, LENGTH(prod.prod_title) - 5)
> WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
> ORDER BY feed_new.new_title;
> [[...]]
> The query is doing a scan of the 9816 records that have pub_id = @PUBID, but 
> even so this seems like a long time. Are the built-in string functions really 
> that slow?

The general rule is:
If you are not taking the value of a column directly but are applying
any function to it (like "LEFT" in your statement), an index cannot be
used. So the table needs to be accessed (scanned) and the function
computed on each row, to evaluate the predicate (the comparison).

I am no expert in checking "explain" output, so I may well be wrong in
my guess:
I think the execution will scan the whole "feed_new" table (895 records)
for each of those 9816 matches of pub_id, so it is doing 8785320 calls
of "LEFT()" followed by a string comparison.

> I suspect it would be faster if I built separate tables that had just the 
> shortened versions of the titles, but I wouldn't think that would be 
> necessary.

So IMO you have two choices:
- Either you accept the performance implications of a table scan
  (which will of course get worse when your data grows),
- or you introduce another column in your table in which you store the
  function result (maintained on INSERT and UPDATE) and create an index
  on this column.

An additional table with the shortened columns is no good idea IMO,
because you would need to maintain it in sync with your "real" data.
In your example, it should be sufficient to add the new column to table
"feed", because your execution strategy should start by evaluating
    prod.pub_id = @PUBID


Joerg Bruehe,  MySQL Build Team,  Joerg.Bruehe@stripped
Sun Microsystems GmbH,   Komturstrasse 18a,   D-12099 Berlin
Geschaeftsfuehrer: Juergen Kunz
Amtsgericht Muenchen: HRB161028

Slow query using string functionsJerry Schwartz26 May
  • RE: Slow query using string functionsGavin Towey27 May
    • RE: Slow query using string functionsJerry Schwartz27 May
  • Re: Slow query using string functionsJoerg Bruehe27 May
  • Re: Slow query using string functionsBaron Schwartz27 May
    • RE: Slow query using string functionsJerry Schwartz27 May