>>>>> "maurice" == maurice <maurice@stripped> writes:
>> Description:
maurice> We have the following table in which we record every login to our site:
maurice> CREATE TABLE logins (
maurice> registration int(10) unsigned DEFAULT '0' NOT NULL,
maurice> login_date datetime DEFAULT '0000-00-00 00:00:00' NOT NULL,
maurice> KEY registration (registration,login_date),
maurice> KEY login_date (login_date)
maurice> );
maurice> We are running the following query to find all the registrants who
maurice> logged in the previous day:
maurice> SELECT DISTINCT registration
maurice> FROM logins
maurice> WHERE login_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY) AND
maurice> login_date < CURRENT_DATE();
maurice> When the number of matching rows is small, EXPLAIN indicates that a
maurice> "range" search is used. For example, with 101,000 total rows,
maurice> 1,000 of which are from the previous day, explain shows:
maurice>
> +--------+-------+---------------+------------+---------+------+------+-------+
maurice> | table | type | possible_keys | key | key_len | ref | rows | Extra
> |
maurice>
> +--------+-------+---------------+------------+---------+------+------+-------+
maurice> | logins | range | login_date | login_date | NULL | NULL | 676 |
> |
maurice>
> +--------+-------+---------------+------------+---------+------+------+-------+
maurice> That looks good. However, when the number of rows for a specific day
maurice> becomes a larger percentage of the total rows, the optimizer chooses an
maurice> "index" search based on the registration key. Here's the EXPLAIN results
maurice> with 120,000 total rows, 20,000 of which are from the previous day:
maurice>
> +--------+-------+---------------+--------------+---------+------+--------+-------------------------+
maurice> | table | type | possible_keys | key | key_len | ref | rows |
> Extra |
maurice>
> +--------+-------+---------------+--------------+---------+------+--------+-------------------------+
maurice> | logins | index | login_date | registration | 12 | NULL | 120000 |
> where used; Using index |
maurice>
> +--------+-------+---------------+--------------+---------+------+--------+-------------------------+
>
maurice> If the registration column isn't referenced, it reverts back to a range
maurice> search:
maurice> EXPLAIN SELECT DISTINCT login_date FROM logins WHERE
maurice> login_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY) AND
maurice> login_date < CURRENT_DATE();
maurice>
> +--------+-------+---------------+------------+---------+------+-------+---------------+
maurice> | table | type | possible_keys | key | key_len | ref | rows | Extra
> |
maurice>
> +--------+-------+---------------+------------+---------+------+-------+---------------+
maurice> | logins | range | login_date | login_date | NULL | NULL | 20048 | ;
> Using index |
maurice>
> +--------+-------+---------------+------------+---------+------+-------+---------------+
>
maurice> My question is, is the optimizer making the best decision in this case?
maurice> Is it really faster to use an "index" search in the above scenario?
Hi!
MySQL tries to calculate how many 'blocks' it needs to access to
satisfy a query.
If it uses the 'login_date' index, MySQL will have to pull out each
row in a random order from disk.
If MySQL uses the registration key, MySQL will only have to consult
the index blocks, which hopefully are cached in the key buffer.
Which version is faster is largely depending on how your table is
laid out on your disk, but neither version should be that bad.
Regards,
Monty