List:General Discussion« Previous MessageNext Message »
From:jmat Date:April 28 2003 4:28pm
Subject:strange behavior with query optimizer regarding 'range'
View as plain text  
>Description:

I have a table that collects hit statistics for a website.  I am storing the data hourly,
based on a user
identifier. The table structure:

CREATE TABLE hits_hourly (
  uid mediumint(8) unsigned NOT NULL default '0',
  hournum int(3) unsigned NOT NULL default '0',
  hitcount smallint(5) unsigned NOT NULL default '0',
  PRIMARY KEY  (hournum,uid),
  KEY uid (uid)
) TYPE=MyISAM DELAY_KEY_WRITE=1;

The hour number is calculated like this: (to_days(now())*24+hour(now())).  So I am getting
a
unique hour number.

I am attempting to retrieve data like this:

select sum(hitcount) from hits_hourly where hournum >=
(to_days(now())*24+hour(now())-24);

This would give me a sum of the hitcounts from 24 hours ago to now.  I would suspect this
would use
a range to get the data.

+-------------+------+---------------+------+---------+------+-------+-------------+
| table       | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+-------------+------+---------------+------+---------+------+-------+-------------+
| hits_hourly | ALL  | PRIMARY       | NULL |    NULL | NULL | 21674 | Using where |
+-------------+------+---------------+------+---------+------+-------+-------------+

Explain says otherwise.

So I tried doing it differently, making my application figure out the hournum:

explain select sum(hitcount) from hits_hourly where hournum between 17560744 and 17560746;
+-------------+-------+---------------+---------+---------+------+------+-------------+
| table       | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+-------------+-------+---------------+---------+---------+------+------+-------------+
| hits_hourly | range | PRIMARY       | PRIMARY |       4 | NULL | 2722 | Using where |
+-------------+-------+---------------+---------+---------+------+------+-------------+

That works.. using a range.. and it's a lot faster, but it's only got 3 hours in there.

But... when I do more than 3...

explain select sum(hitcount) from hits_hourly where hournum between 17560744 and 17560747;
+-------------+------+---------------+------+---------+------+-------+-------------+
| table       | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+-------------+------+---------------+------+---------+------+-------+-------------+
| hits_hourly | ALL  | PRIMARY       | NULL |    NULL | NULL | 21674 | Using where |
+-------------+------+---------------+------+---------+------+-------+-------------+

we go back to a full table scan, and it's VERY slow (well, comparitively... 0.40 seconds
to .04 seconds).

If I create a new index, consisting of (hournum,hitcount), it will use that key
consistently -- but that is
duplicating a lot of data into the index that shouldn't really be there.


>How-To-Repeat:
	see description, sample data set is at http://shutdown.net/~jmat/hits_hourly.sql.gz
>Fix:
	

>Submitter-Id:	<submitter ID>
>Originator:	Justin Matlock
>Organization:  N/A
>MySQL support: none
>Synopsis:	query optimizer switches from range to ALL for no good reason
>Severity:	non-critical 
>Priority:	low
>Category:	mysql
>Class:		sw-bug
>Release:	mysql-4.0.12 (Official MySQL RPM)
>Server: /usr/bin/mysqladmin  Ver 8.40 Distrib 4.0.12, for pc-linux on i686
Copyright (C) 2000 MySQL AB & MySQL Finland AB & TCX DataKonsult AB
This software comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to modify and redistribute it under the GPL license

Server version		4.0.12-Max-log
Protocol version	10
Connection		Localhost via UNIX socket
UNIX socket		/var/lib/mysql/mysql.sock
Uptime:			12 hours 52 min 56 sec

Threads: 179  Questions: 916059  Slow queries: 30  Opens: 636  Flush tables: 1  Open
tables: 193  Queries per second avg: 19.753
>C compiler:    2.95.3
>C++ compiler:  2.95.3
>Environment:
	Redhat 7.3; custom kernel to support IDE RAID controller
System: Linux epsilon 2.4.19 #12 SMP Wed Feb 12 12:01:05 EST 2003 i686 unknown
Architecture: i686

Some paths:  /usr/bin/perl /usr/bin/make /usr/bin/gmake /usr/bin/gcc /usr/bin/cc
GCC: Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-113)
Compilation info: CC='gcc'  CFLAGS='-O6 -fno-omit-frame-pointer -mpentium'  CXX='g++' 
CXXFLAGS='-O6 -fno-omit-frame-pointer 	          -felide-constructors -fno-exceptions
-fno-rtti -mpentium'  LDFLAGS=''  ASFLAGS=''
LIBC: 
lrwxrwxrwx    1 root     root           13 Mar 25 17:48 /lib/libc.so.6 -> libc-2.2.5.so
-rwxr-xr-x    1 root     root      1260480 Mar  5 21:32 /lib/libc-2.2.5.so
-rw-r--r--    1 root     root      2312442 Mar  5 21:09 /usr/lib/libc.a
-rw-r--r--    1 root     root          178 Mar  5 20:51 /usr/lib/libc.so
Configure command: ./configure '--disable-shared' '--with-mysqld-ldflags=-all-static'
'--with-client-ldflags=-all-static' '--without-berkeley-db' '--with-innodb'
'--without-vio' '--without-openssl' '--enable-assembler' '--enable-local-infile'
'--with-mysqld-user=mysql' '--with-unix-socket-path=/var/lib/mysql/mysql.sock'
'--prefix=/' '--with-extra-charsets=complex' '--exec-prefix=/usr'
'--libexecdir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share'
'--localstatedir=/var/lib/mysql' '--infodir=/usr/share/info' '--includedir=/usr/include'
'--mandir=/usr/share/man' '--with-embedded-server' '--enable-thread-safe-client'
'--with-comment=Official MySQL RPM' 'CFLAGS=-O6 -fno-omit-frame-pointer -mpentium'
'CXXFLAGS=-O6 -fno-omit-frame-pointer 	          -felide-constructors -fno-exceptions
-fno-rtti -mpentium'

Thread
strange behavior with query optimizer regarding 'range'jmat28 Apr
  • Re: strange behavior with query optimizer regarding 'range'Dan Nelson28 Apr
  • Re: strange behavior with query optimizer regarding 'range'Justin Matlock30 Apr