In the last episode (Dec 27), Chris Picton said:
> I am trying to get to grips with understanding mysql performance.
>
> I have the following query:
>
> select sql_no_cache count(*), date_format(calldate, '%y-%m-%d') as m from
> cdr_warehouse group by m;
>
> This gives me:
> 115 rows in set (59.52 sec)
>
> mysql> explain select count(*), date_format(calldate, '%y-%m-%d') as m from
> cdr_warehouse group by m\G
> *************************** 1. row ***************************
> id: 1
> select_type: SIMPLE
> table: cdr_warehouse
> type: index
> possible_keys: NULL
> key: calldate_idx
> key_len: 8
> ref: NULL
> rows: 43708571
> Extra: Using index; Using temporary; Using filesort
> 1 row in set (0.00 sec)
>
> How can I improve the running speed of this query? I am running
> 5.1.30, but don't (yet) want to partition the table (horizontally or
> vertically). Nothing else on the server is touching this table at
> the moment. The exact date_format is not important, as I may want to
> group by 5 second intervals, or full months.
With an index scan on a 43-million-record index, you're going to be
CPU-bound instead of I/O-bound on that query, so use the fastest
expression you can in the group-by clause. Surprisingly, date_format
seems to be pretty good. For grouping by days, date(calldate) is
around 20% faster, but converting to a unix_timestamp and DIViding (the
only way to get things like 5-second intervals) is 2x slower.
Converting that datetime field into a timestamp field should speed up
the unix_timestamp function (since timestamps are internally stored as
unix time_t values), but you'll need to disable the auto-updating
features as well:
http://dev.mysql.com/doc/refman/5.1/en/timestamp.html
http://dev.mysql.com/doc/refman/5.1/en/storage-requirements.html
--
Dan Nelson
dnelson@stripped