OK, I've read everything I can about optimizing queries but I think I'm
still missing something.
Here's the scenario:
Table 'url' stores URLs and ID numbers.
Table 'pageview' stores ID numbers (related to ID numbers in the url
table), a count of how many times they were requested, and the date.
mysql> describe url;
+-------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+----------------+
| url | varchar(128) | | MUL | | |
| id | int(11) | | PRI | 0 | auto_increment |
+-------+--------------+------+-----+---------+----------------+
mysql> describe pageview;
+-------+-----------------------+------+-----+------------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------------------+------+-----+------------+-------+
| id | mediumint(8) unsigned | | MUL | 0 | |
| date | date | | MUL | 0000-00-00 | |
| site | int(11) | | | 0 | |
| count | int(11) | | | 0 | |
+-------+-----------------------+------+-----+------------+-------+
I believe I have a good set of keys:
mysql> show keys from url;
+-------+------------+----------+--------------+-------------+-----------+--
-----------+----------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation |
Cardinality | Sub_part |
+-------+------------+----------+--------------+-------------+-----------+--
-----------+----------+
| url | 0 | PRIMARY | 1 | id | A |
71070 | NULL |
| url | 0 | urlkey | 1 | url | A |
71070 | 70 |
+-------+------------+----------+--------------+-------------+-----------+--
-----------+----------+
mysql> show keys from pageview;
+----------+------------+----------+--------------+-------------+-----------
+-------------+----------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation
| Cardinality | Sub_part |
+----------+------------+----------+--------------+-------------+-----------
+-------------+----------+
| pageview | 1 | urlidx | 1 | id | A
| 78489 | NULL |
| pageview | 1 | dateidx | 1 | date | A
| 25 | NULL |
+----------+------------+----------+--------------+-------------+-----------
+-------------+----------+
mysql> select sum(count) from pageview,url where url.url like '/sports/%'
and pageview.date='1999-06-16' and url.id=pageview.id;
+------------+
| sum(count) |
+------------+
| 14398 |
+------------+
1 row in set (16 min 53.27 sec)
this is too slow. table URL has 71070 unique URLs in it. table pageview
has 1255834 rows (can't be unique because a given URL should show up across
many days).
mysql> explain select sum(count) from pageview,url where url.url like
'/sports/%' and pageview.date='1999-06-16' and url.id=pageview.id;
+----------+-------+----------------+---------+---------+------+-------+----
---+
| table | type | possible_keys | key | key_len | ref | rows |
Extra |
+----------+-------+----------------+---------+---------+------+-------+----
---+
| pageview | ref | urlidx,dateidx | dateidx | 3 | ??? | 38080 |
|
| url | range | PRIMARY,urlkey | urlkey | NULL | NULL | 3681 |
|
+----------+-------+----------------+---------+---------+------+-------+----
---+
this looks to me like it's picking the right keys. am i wrong? what else
can i do to speed things up?
Jon Drukman
Director Of Technology
GameSpot