List:General Discussion« Previous MessageNext Message »
From:Matthias Pigulla Date:May 6 1999 9:50pm
Subject:How to optimize this query for MySQL?
View as plain text  
Hi folks,

<parental advisory - explicit description ;->

I admit I'm a database novice, so maybe I've got a wrong idea of how
queries are executed and how the necessary rows can be retreived.
Anyhow, would you mind explaining me why I have such - on my opinion
strange - results in EXPLAIN commands and how I can work around them?

I have a database storing discussion threads. There are several groups,
each containing a lot of threads, once again containing much more
postings. Each posting is written by a user. With my humble knowledge of
db design and normalization, I desgined the db as follows:

 table group       table thread      table posting      table user
         *id <-+            *id <-+            *id  +-->       *id
 other stuff   +--- +belongs to   +--- +belongs to  |     nickname
		      topic etc           text etc  |  other stuff
                                        written by -+

In other words, each entity (posting, thread) references the superior
one. * is a primary key, + a "normal" one. The table index for
"postings" is built as (belongs to, timestamp), for I thought this way
it allowed quick access to MAX(timestamp) when having a given thread ID
for "belongs to".

Now, I would like to SELECT a list of all threads for a certain group.
This list ought to contain the date of the last posting and the number
of postings for each thread, along with the authoring users name,
ordered by lastpost.

Here's my query. There's still a bug, for the selection of "users.nick"
along with the GROUP BY clause does not alway give me the author of the
oldest (first) posting (which is, of course, the user who started the
thread), but that's not important at the moment.

as lastpost
WHERE valid id)
ORDER BY lastpost DESC

In my naive idea of "database internals" I expected MySQL to
- fetch all threads for the group/forum
- for each thread, quickly retrieve MAX(post_date) with the compound
index described above
- add the author

Rather unexpectedly, I get the following EXPLAIN for the above query:

| table    | type   | possible_keys | key     | ref             | rows 
| Extra      |
| postings | ALL    | thread        | NULL    | NULL            | 52898
|            |
| threads  | eq_ref | PRIMARY,forum | PRIMARY | postings.thread |     1
| where used |
| users    | eq_ref | PRIMARY       | PRIMARY | postings.user   |     1
|            |

The "ALL" type in the first row is - as you can easily imagine - a real
drain, even on a cool tuned up system. I tried using "threads
STRAIGHT_JOIN postings", but the product of the rows was even worse, so
the optimizer's desicison seems to be allright.

Now I wonder what to do. Is there a flaw in my database design? Is it an
unwise decision to have COUNT, MAX and GROUP BY in my query - or more in
general, are these functions indicating design flaws like NULL values

For example, I might store the date of the last posting in the thread
table and save the MAX() call, but that seems to be unclean, for in fact
the latest posting contains the latest date, and anything else would be
redundancy and might introduce integrity problems.

Or should I store the group/forum id in each posting, not each thread?
Also unclean, for threads belong to forums and postings belong to
threads - not postings and threads to forums at the same time.

Is there any special (MySQL) thing to think of? Does anyone of you have
an idea how to optimize this?

Any hints are appreciated, thanks a lot for the time!

How to optimize this query for MySQL?Matthias Pigulla7 May
  • Re: How to optimize this query for MySQL?Christian Mack7 May
  • How to optimize this query for MySQL?Michael Widenius11 May
Re: MyODBCEIN)28 Mar