List:General Discussion« Previous MessageNext Message »
From:Johan De Meersman Date:September 28 2010 3:50pm
Subject:Re: Howto optimize Distinct query over 1.3mil rows?
View as plain text  
If Cal_NO is a recurring value, then yes, that is the way it should be done
in a relational schema.

Your index cardinality of 15.000 against 1.3 million rows is reasonable,
although not incredible; is your index cache large enough to acccomodate all
your indices ?


On Tue, Sep 28, 2010 at 5:02 PM, John Stoffel <john@stripped> wrote:

>
> Hi,
>
> I'm running MySQL 5.0.51a-24+lenny3-log  on a Debian Lenny box with
> 2Gb of RAM and a pair of dual core 2.6Ghz CPUs.  I'm using a pair of
> 40Gb disks mirrored using MD (Linux software RAID) for both the OS and
> the storage of the mysql DBs.
>
> My problem child is doing this simple query:
>
>   mysql> select distinct Call_No from Newspaper_Issues
>   mysql> WHERE BIB_ID = 464;
>   +----------+
>   | Call_No  |
>   +----------+
>   | News     |
>   | NewsD CT |
>   +----------+
>   2 rows in set (2.98 sec)
>
> The Newspaper_Issues table has 1.3 million rows, and has a structure
> like this:
>
>   mysql> describe  Newspaper_Issues;
>   +----------------+-------------+------+-----+---------+----------------+
>   | Field          | Type        | Null | Key | Default | Extra          |
>   +----------------+-------------+------+-----+---------+----------------+
>   | Record_No      | int(11)     | NO   | PRI | NULL    | auto_increment |
>   | BIB_ID         | varchar(38) | NO   | MUL | NULL    |                |
>   | Issue_Date     | date        | NO   | MUL | NULL    |                |
>   | Type_Code      | char(1)     | NO   |     | r       |                |
>   | Condition_Code | char(1)     | NO   |     | o       |                |
>   | Format_Code    | char(1)     | NO   |     | p       |                |
>   | Edition_Code   | char(1)     | NO   |     | n       |                |
>   | Date_Type_Code | char(1)     | NO   |     | n       |                |
>   | Ed_Type        | tinyint(1)  | NO   |     | 1       |                |
>   | RS_Code        | char(1)     | NO   |     | c       |                |
>   | Call_No        | varchar(36) | YES  | MUL | NULL    |                |
>   | Printed_Date   | varchar(10) | YES  |     | NULL    |                |
>   | Update_Date    | date        | NO   |     | NULL    |                |
>   +----------------+-------------+------+-----+---------+----------------+
>   13 rows in set (0.00 sec)
>
>
> I've tried adding various indexes, and reading up on howto optimize
> DISTINCT or GROUP BY queries, but I'm hitting a wall here.  My current
> indexes are:
>
> mysql> show index from Newspaper_Issues;
>
>
> +------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
> | Table            | Non_unique | Key_name             | Seq_in_index |
> Column_name | Collation | Cardinality | Sub_part | Packed | Null |
> Index_type | Comment |
>
>
> +------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
> | Newspaper_Issues |          0 | PRIMARY              |            1 |
> Record_No   | A         |     1333298 |     NULL | NULL   |      | BTREE
>  |         |
> | Newspaper_Issues |          1 | BIB_ID               |            1 |
> BIB_ID      | A         |       14980 |     NULL | NULL   |      | BTREE
>  |         |
> | Newspaper_Issues |          1 | Call_No              |            1 |
> Call_No     | A         |         927 |     NULL | NULL   | YES  | BTREE
>  |         |
> | Newspaper_Issues |          1 | Issue_Date           |            1 |
> Issue_Date  | A         |       49381 |     NULL | NULL   |      | BTREE
>  |         |
> | Newspaper_Issues |          1 | BIB_ID_Issue_Date    |            1 |
> BIB_ID      | A         |       14980 |     NULL | NULL   |      | BTREE
>  |         |
> | Newspaper_Issues |          1 | BIB_ID_Issue_Date    |            2 |
> Issue_Date  | A         |     1333298 |     NULL | NULL   |      | BTREE
>  |         |
> | Newspaper_Issues |          1 | call_no_short        |            1 |
> Call_No     | A         |          30 |        6 | NULL   | YES  | BTREE
>  |         |
> | Newspaper_Issues |          1 | BIB_ID_call_no_short |            1 |
> BIB_ID      | A         |       14980 |     NULL | NULL   |      | BTREE
>  |         |
> | Newspaper_Issues |          1 | BIB_ID_call_no_short |            2 |
> Call_No     | A         |       15503 |     NULL | NULL   | YES  | BTREE
>  |         |
> | Newspaper_Issues |          1 | call_no_bib_id       |            1 |
> Call_No     | A         |         927 |     NULL | NULL   | YES  | BTREE
>  |         |
> | Newspaper_Issues |          1 | call_no_bib_id       |            2 |
> BIB_ID      | A         |       15503 |     NULL | NULL   |      | BTREE
>  |         |
>
>
> +------------------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
> 11 rows in set (0.00 sec)
>
>
> So now when I do an explain on my query I get:
>
>    mysql> explain select distinct(Call_No) from Newspaper_Issues WHERE
> BIB_ID = 464;
>
> 
> +----+-------------+------------------+-------+-----------------------------------------------+----------------+---------+------+---------+--------------------------+
>    | id | select_type | table            | type  | possible_keys
>                       | key            | key_len | ref  | rows    | Extra
>                  |
>
> 
> +----+-------------+------------------+-------+-----------------------------------------------+----------------+---------+------+---------+--------------------------+
>    |  1 | SIMPLE      | Newspaper_Issues | index |
> BIB_ID,BIB_ID_Issue_Date,BIB_ID_call_no_short | call_no_bib_id | 227     |
> NULL | 1333298 | Using where; Using index |
>
> 
> +----+-------------+------------------+-------+-----------------------------------------------+----------------+---------+------+---------+--------------------------+
>    1 row in set (0.00 sec)
>
> Interestingly enough, I get much better performance if I just drop the
> WHERE clause, but that doesn't help me get what I want though.  *grin*
>
>    mysql> explain select distinct(Call_No) from Newspaper_Issues;
>
> 
> +----+-------------+------------------+-------+---------------+---------+---------+------+------+--------------------------+
>    | id | select_type | table            | type  | possible_keys | key
> | key_len | ref  | rows | Extra                    |
>
> 
> +----+-------------+------------------+-------+---------------+---------+---------+------+------+--------------------------+
>    |  1 | SIMPLE      | Newspaper_Issues | range | NULL          | Call_No
> | 111     | NULL |  928 | Using index for group-by |
>
> 
> +----+-------------+------------------+-------+---------------+---------+---------+------+------+--------------------------+
>    1 row in set (0.00 sec)
>
>
>
> Would it make sense to split the Call_No data off into it's own table, and
> put in a proper numeric ID, instead of the current VARCHAR(36) it uses?  So
> in Newspaper_Issues I'd just have a Call_No_ID and a new Call_No table with:
>
>      Call_No_ID    INT
>      Call_No       Char(36)
>
> which would hopefully index better?  I only have 928 distinct Call_No
> strings, so I'm not afraid of wasting space or anything.
>
> Thanks,
> John
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql?unsub=1
>
>


-- 
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

Thread
Howto optimize Distinct query over 1.3mil rows?John Stoffel28 Sep
  • Re: Howto optimize Distinct query over 1.3mil rows?Johnny Withers28 Sep
    • Re: Howto optimize Distinct query over 1.3mil rows?John Stoffel28 Sep
  • Re: Howto optimize Distinct query over 1.3mil rows?Johan De Meersman28 Sep