List:General Discussion« Previous MessageNext Message »
From:Jesse Sheidlower Date:August 22 2003 2:58am
Subject:Re: Slow results with simple, well-indexed query
View as plain text  
On Thu, Aug 21, 2003 at 06:58:29PM -0700, Steven Roussey wrote:
>Jesse Sheidlower wrote:
> 
> > Hmm. When I returned to the multiple-table query that started
> > this thread, 
> 
> And it was slow. Yeah, one thing at a time. It makes it easier for
> people reading this list now or in the future (if it comes up in a
> search result) if we go over things one item at a time.
> 
> Since I never saw the whole table definitions (the indexes in
> particular), I'll have to try and guess through it. So try this:

All the indexes were single indexes, partly because I haven't
yet made the effort to understand composite index. I guess it's
time ;-).

Here's the CREATEs, somewhat edited to remove parts not relevant
to this discussion, to save space:

CREATE TABLE `cg` (
  `q_id` int(10) unsigned NOT NULL default '0',
  `cw` varchar(100) default NULL,
  `exp` text,
  KEY `q_id` (`q_id`),
  KEY `cw` (`cw`),
  KEY `q_id_2` (`q_id`,`cw`),
  FULLTEXT KEY `exp` (`exp`)
) TYPE=MyISAM

CREATE TABLE `q` (
  `id` int(10) unsigned NOT NULL default '0',
  `cit_id` int(10) unsigned NOT NULL default '0',
  `q_tag` enum('q','qau','qca','qna','qsa') default NULL,
  `qt` text,
  PRIMARY KEY  (`id`),
  KEY `cit_id` (`cit_id`),
  FULLTEXT KEY `qt` (`qt`)
) TYPE=MyISAM 

CREATE TABLE `cit` (
  `id` int(10) unsigned NOT NULL default '0',
  `sref_id` int(10) unsigned NOT NULL default '0',
  `w` varchar(200) default NULL,
  PRIMARY KEY  (`id`),
  KEY `sref_id` (`sref_id`),
  FULLTEXT KEY `w` (`w`),
  ) TYPE=MyISAM 

CREATE TABLE `sref` (
  `id` int(10) unsigned NOT NULL default '0',
  `rdr` varchar(30) default NULL,
  `kbd` varchar(20) default NULL,
  `cd` date default NULL,
  PRIMARY KEY  (`id`),
  KEY `rdr` (`rdr`),
  KEY `kbd` (`kbd`),
  KEY `cd` (`cd`)
) TYPE=MyISAM

> ALTER TABLE cg add index(q_id,cw);

I did this, as is reflected in the CREATE above.

> Tell me how that works and send the EXPLAIN.

Unfortunately, it made no difference--the first execution was
about 1 m 15 sec, and one immediately thereafter was about 3.5 sec,
as before. The EXPLAIN shows:

mysql> EXPLAIN SELECT cg.cw FROM cg,q,cit,sref
    -> WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
    -> AND cg.cw LIKE 't%'
    -> AND sref.cd > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
    -> ORDER BY cg.cw
    -> LIMIT 1000,10;
+-------+--------+-----------------+---------+---------+-------------+--------+-------------+
| table | type   | possible_keys   | key     | key_len | ref         | rows   | Extra     
 |
+-------+--------+-----------------+---------+---------+-------------+--------+-------------+
| cg    | range  | q_id,cw,q_id_2  | cw      |     101 | NULL        | 190550 | Using
where |
| q     | eq_ref | PRIMARY,cit_id  | PRIMARY |       4 | cg.q_id     |      1 |           
 |
| cit   | eq_ref | PRIMARY,sref_id | PRIMARY |       4 | q.cit_id    |      1 |           
 |
| sref  | eq_ref | PRIMARY,cd      | PRIMARY |       4 | cit.sref_id |      1 | Using
where |
+-------+--------+-----------------+---------+---------+-------------+--------+-------------+

> The point here is that now you are doing a join and you are using both
> columns to qualify the resultset. So we should use a composite index
> rather than have individual ones (of which MySQL will choose only one).

What does this mean for regular searching? In most cases, there will be
some criteria entered that need to be searched on, and the id fields 
will also be needed for the joins. For example, in the database, one
might want to search based on cg.exp (fulltext), sref.rdr, sref.cd
(the date field), sref.kbd, cit.w, and various other ones I've edited
out of this display to save space, and often a combination of several
of these at once. How should I set up indexes for the potential 
searches that might be executed? 

(I should mention that this is a read-only database; it's built from
a parsed SGML file and is never added to directly, if that's an
issue.)

> Also, you can change line 
> 	AND cg.cw BETWEEN 't' AND 'tzzz'
> To 
> 	AND cg.cw like 't%'
> For better readability (how many zzz's are enough, eh?). Personal
> preference.

No, I agree, and it was originally "LIKE 't%'" and is still like
that in the actual code being generated by the query form. I changed
it to the BETWEEN because in some playing around it seemed to be 
faster this way, and in fact I was worried about having to
generate the 

"BETWEEN \'$val\' AND \'" . $val . "zzzz\'"

thing in my program. If this was a glitch of my badly-indexed
original, I'm glad to fix it.

Thanks again for taking the time to look this over.

Best,

Jesse
Thread
Slow results with simple, well-indexed querySteven Roussey22 Aug
  • Re: Slow results with simple, well-indexed queryJesse Sheidlower22 Aug
    • RE: Slow results with simple, well-indexed querySteven Roussey22 Aug
      • Re: Slow results with simple, well-indexed queryJesse Sheidlower22 Aug
        • RE: Slow results with simple, well-indexed querySteven Roussey22 Aug
          • Re: Slow results with simple, well-indexed queryJesse Sheidlower22 Aug
            • RE: Slow results with simple, well-indexed querySteven Roussey22 Aug
              • Re: Slow results with simple, well-indexed queryJesse Sheidlower22 Aug
                • RE: Slow results with simple, well-indexed querySteven Roussey22 Aug
                  • Re: Slow results with simple, well-indexed queryJesse Sheidlower22 Aug
                    • RE: Slow results with simple, well-indexed querySteven Roussey22 Aug
                      • Re: Slow results with simple, well-indexed queryJesse Sheidlower22 Aug
            • RE: Slow results with simple, well-indexed querySteven Roussey22 Aug
              • Re: Slow results with simple, well-indexed queryJesse Sheidlower22 Aug
        • Re: Slow results with simple, well-indexed queryKeith C. Ivey22 Aug
          • Re: Slow results with simple, well-indexed queryJesse Sheidlower22 Aug