Brandon Shuey wrote:
>
> I have a table with several(7+million) million rows of data. I want to group count
> as portion of the data. However, mysql is
> performing a complete table scan and IT TAKES FOREVER (several hours).
> How can I speed this thing up? I was hoping that indexes would have helped! Here is
> the table and the query and the explain
> result:
>
> The total data between '1999-05-01 00:00:00' and '1999-05-02 00:00:00' is 80488
> (24.7% of all the records).
> Shouldn't it use an index for this?
>
> describe tblVisits;
> +-----------------+-----------+------+-----+---------------------+-------+
> | Field | Type | Null | Key | Default | Extra |
> +-----------------+-----------+------+-----+---------------------+-------+
> | VisitLogID | char(20) | | PRI | | |
> | HostAccount | char(20) | | MUL | | |
> | Style | char(20) | | | | |
> | Category | char(20) | | | | |
> | Server | char(20) | | | | |
> | UserID | char(20) | | | | |
> | Format | char(20) | | | | |
> | Source | char(20) | | | | |
> | RemoteIP | char(20) | | MUL | | |
> | UserAgent | char(20) | | | | |
> | RemoteHost | char(20) | | | | |
> | Referer | char(100) | | | | |
> | Date_Time | datetime | | MUL | 0000-00-00 00:00:00 | |
> | ReferralAccount | char(20) | | | | |
> | Summarized | int(11) | | MUL | 0 | |
> | Quality | int(11) | | | 0 | |
> +-----------------+-----------+------+-----+---------------------+-------+
> show index from tblVisits;
>
> +-----------+------------+------------------+--------------+-------------+-----------+-------------+----------+
> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation
> | Cardinality | Sub_part |
>
> +-----------+------------+------------------+--------------+-------------+-----------+-------------+----------+
> | tblVisits | 0 | PRIMARY | 1 | VisitLogID | A
> | 332995 | NULL |
> | tblVisits | 1 | HostAccount_indx | 1 | HostAccount | A
> | NULL | NULL |
> | tblVisits | 1 | HostAccount_indx | 2 | Style | A
> | NULL | NULL |
> | tblVisits | 1 | HostAccount_indx | 3 | Category | A
> | NULL | NULL |
> | tblVisits | 1 | HostAccount_indx | 4 | Server | A
> | NULL | NULL |
> | tblVisits | 1 | Date_Time | 1 | Date_Time | A
> | NULL | NULL |
> | tblVisits | 1 | Date_Time | 2 | HostAccount | A
> | NULL | NULL |
> | tblVisits | 1 | Date_Time | 3 | Style | A
> | NULL | NULL |
> | tblVisits | 1 | Date_Time | 4 | Style | A
> | NULL | NULL |
> | tblVisits | 1 | Date_Time | 5 | Category | A
> | NULL | NULL |
> | tblVisits | 1 | Date_Time_2 | 1 | Date_Time | A
> | NULL | NULL |
> | tblVisits | 1 | Date_Time_2 | 2 | Server | A
> | NULL | NULL |
> | tblVisits | 1 | RemoteIP | 1 | RemoteIP | A
> | NULL | NULL |
> | tblVisits | 1 | RemoteIP | 2 | Date_Time | A
> | NULL | NULL |
> | tblVisits | 1 | Summarized | 1 | Summarized | A
> | NULL | NULL |
> | tblVisits | 1 | Summarized | 2 | Quality | A
> | NULL | NULL |
>
> +-----------+------------+------------------+--------------+-------------+-----------+-------------+----------+
> EXPLAIN SELECT
> Date_Format(Date_Time,'%Y-%m-%d') as cDate,
> HostAccount,
> Style,
> Category,
> Count(*)
> FROM tblVisits
> WHERE Date_Time >= '1999-05-01 00:00:00' and Date_Time <'1999-05-02 00:00:00'
> GROUP BY cDate,HostAccount,Style,Category;
>
> +-----------+------+-----------------------+------+---------+------+--------+------------+
> | table | type | possible_keys | key | key_len | ref | rows | Extra
> |
>
> +-----------+------+-----------------------+------+---------+------+--------+------------+
> | tblVisits | ALL | Date_Time,Date_Time_2 | NULL | NULL | NULL | 332995 | where
> used |
>
> +-----------+------+-----------------------+------+---------+------+--------+------------+
> select count(*) from tblVisits;
> +----------+
> | count(*) |
> +----------+
> | 332995 |
> +----------+
>
> Brandon Shuey
Hi Brandon
Use the BETWEEN syntax to get date ranges:
SELECT
Date_Format(Date_Time,'%Y-%m-%d') as cDate,
HostAccount,
Style,
Category,
Count(*)
FROM
tblVisits
WHERE
Date_Time BETWEEN '1999-05-01 00:00:00' AND '1999-05-02 00:00:00'
GROUP BY cDate,HostAccount,Style,Category;
This will only speed up a little.
The main problem with your query is the GROUP BY part.
Mysql has to calculate each cDate before grouping.
Tschau
Christian