List:General Discussion« Previous MessageNext Message »
From:Michael Stassen Date:June 12 2004 10:44pm
Subject:Re: Slow query: Getting first initials from an entire table's data
View as plain text  
TK wrote:

> At 05:02 PM 6/12/2004 +0200, Harald Fuchs wrote:
> 
>>Other DBMSs like PostgreSQL grok indexes on functional expressions;
>>MySQL doesn't.  Thus your only choice seems to be storing the
>>uppercased initial in a separate column and putting an index on that
>>column.
> 
> 
> As I indicated, I already tried that it had no effect at all. MySQL still
> scans the whole table, still doesn't use any indexes, and takes forever at it.

Nevertheless, Harald is correct.  MySQL cannot use an index on a column when 
you are feeding the column to a function, so your only hope of using an 
index is to add an initials column.

Your statement that this still doesn't work surprised me, so I put it to the 
test.  I made a table named inits with 50,000 randomly chosen initials in a 
column named init and indexed it.  Both

   SELECT DISTINCT init FROM inits ORDER BY init;

and

   SELECT init, COUNT(*) FROM inits GROUP BY INIT;

used the index and took .5 seconds on a 5 year old G3 iMac running mysql 
4.0.20.  If no index was used when you tried an indexed initials column, I 
conclude that something else in your query caused it.

<snip>
>>>I haven't been able to think of a way to do this efficiently.  My current
> query looks like this:
>>>         select DISTINCT UPPER(LEFT(n.Name,1)) as Initial
>>>         from Names n, Things t
>>>         where n.ID = t.ID
>>>         order by Initial desc
>>
>>>Even if I eliminate DISTINCT, or create a single character index on
>>>Name, or create a whole field that just has the first character of
>>>Name, I can't figure out how to get MySQL to not have to scan the
>>>entire table.

I don't understand what you are trying to do here.  You join Names to Things 
on ID, but you aren't retrieving any data from Things?  Perhaps that's only 
part of the real query.

I see 3 problems with this query.

1. As already explained, with UPPER(LEFT(n.Name,1)), we can't use an index 
on Name.  Replace this with n.Init, where n.Init has the first initial and 
is indexed.

2. You are joining the 50,000 rows of Name to rows in Things, then throwing 
away approximately 95% of them with DISTINCT.  Indexes will help, but if you 
only have single column indexes, mysql will choose between the indexes on 
Name.ID and Name.init according to which ought to require looking at fewer 
rows (probably ID).  To really use an index to best advantage here, you'll 
need a multicolumn index on ID and init.  Assuming there's some reason to 
join to Things...

3. Mysql is bad at ORDER BY ... DESC.  The index works great for ASC but 
poorly for DESC.  For example, in my test queries above, mysql takes 10 
times longer if I sort init in descending order (which makes me think mysql 
is sorting too soon).  In my case,

   CREATE TEMPORARY TABLE initials SELECT DISTINCT init FROM inits;
   SELECT * FROM initials ORDER BY init DESC;

is almost as fast as my original, ascending order query.

If the query you quoted above isn't the real query, perhaps if you posted 
the real one someone on the list could suggest a workable alternative.

Michael

Thread
Slow query: Getting first initials from an entire table's dataTK12 Jun
  • Re: Slow query: Getting first initials from an entire table's dataHarald Fuchs12 Jun
    • Re: Slow query: Getting first initials from an entire table'sdataTK12 Jun
      • Re: Slow query: Getting first initials from an entire table's dataMichael Stassen12 Jun