List:General Discussion« Previous MessageNext Message »
From:Rolando Edwards Date:October 9 2006 4:42pm
Subject:Re: How to speed up query of indexed column with 5M rows?
View as plain text  
What is the cardinality of the flow_fk_owner index?

If the number of distinct owners is low, say 50, then the 'flow_fk_owner'
index contains 50 sets of 100,000 rows with the same key. Even if you run
ANALYZE TABLE or OPTIMIZE TABLE, if the key distribution of the owner across
the table (not the index) is bad (in other words, if the owner fields are
fragmented across the table resulting in many blocks of data being read in
query cachce, or the data was not bulk loaded in order by owner), you could
not increase performance on the owner field. You would have to bulk load the
table from a file that is sort ordered by owner and then OPTIMIZE TABLE,
or simply drop the 'flow_fk_owner' index altogether since MySQL determines that
a full table scan is best.

What is the cardinality of the 'flow_fk_owner' index ???

If the number of distinct owners is too low for the dataset,
then MySQL has it right in this case.

----- Original Message -----
From: Brent Baisley <brenttech@stripped>
To: bowen <linuzboy@stripped>, mysql@stripped
Sent: Monday, October 9, 2006 10:52:47 AM GMT-0500 US/Eastern
Subject: Re: How to speed up query of indexed column with 5M rows?

For some reason it's not using the index on the owner field (flow_fk_owner), it's doing a
full table scan. Which also explains the 
i/o results from vmstat. Since MySQL is not using the index, that means it determined a
full table scan would be quicker (MySQL 
doesn't always get this right).
You should look at your SHOW INDEX FROM flow. That will show you the data distribution
(cardinality) of you index. You may need to 
run an analyze table to get mysql to update it's data stats, which it uses to determine
the best way to run a query.
Also try forcing MySQL to use the index.
SELECT * FROM flow FORCE INDEX flow_fk_owner WHERE owner=11251

Keep in mind that when you do a SELECT *, you are transferring all the data in each record
for the found set. If you just do a 
SELECT count(id)... that will tell you how long it took to run the query without the added
time of transferring the results.

----- Original Message ----- 
From: "bowen" <linuzboy@stripped>
To: <mysql@stripped>
Sent: Monday, October 09, 2006 8:54 AM
Subject: How to speed up query of indexed column with 5M rows?


> How to speed up query of indexed column with 5M rows?
>
> I have a table with more than 5M rows. (400M .MYD 430M .MYI).
>
> It took 27 seconds to do a common select...where... in the index column.
>
> I can not bear the long run.
>
> Vmstat show that system was bounded by IO busy.(Always more than 13000
> bi/s, blocks input / second .)
>
> I increase some options to speed up query.
> I try "SET GLOBAL key_buffer_size=256*1024*1024", the query still need
> 14 seconds.
> try "read_buffer_size = 2M" again, down to 9s.
>
>
> It is a great improvement, but the result still can not be acceptable,
> no matter to multiple query.
>
>
> Is there any method to speed up the query. I found the select by
> primary key of the 5M Row is very quickly(0.05s).
>
> I found if it is IO busy, process will hang much time for loading. If
> index is cached in memory, process will be much more faster.
>
> Bottleneck is IO performance. How to improve mysql io performance?
>
>
>
> Mysql 4.1.21 + linux 2.6.17 + 512M memory + IDE Disk udma5
>
> -----------------------
> CREATE TABLE /*!32312 IF NOT EXISTS*/ `flow` (
> `id` bigint(20) unsigned NOT NULL auto_increment,
> `name` varchar(255) NOT NULL default '',
> `owner` int(10) NOT NULL default '0',
> `uuid` varchar(36) NOT NULL default '',
> `length` int(11) default '0',
> `time` int(11) default '0',
> PRIMARY KEY (`id`),
> UNIQUE KEY `id` (`id`),
> UNIQUE KEY `flow_uuidindex` (`uuid`),
> KEY `flow_nameindex` (`name`),
> KEY `flow_fk_owner` (`owner`)
> ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
>
>
> mysql> explain select * from flow where owner=11251;
>
> +----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
> | id | select_type | table | type | possible_keys | key | key_len |
> ref | rows | Extra |
>
> +----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
> | 1 | SIMPLE | flow | ALL | NULL | NULL | NULL | NULL | 5122593 | Using where |
>
> +----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
> 1 row in set (0.00 sec)
>
> mysql> explain select * from flow where owner=11251;
> Empty set (18.82 sec)
>
>
> If there is no method to improve, can you advise me a redesign of table?
> I think there is a way to do if I only use unique key in where.
>
> Thanks
>
>
> Owner index was 253285 unique owner, probably means user. On average,
> one user have 20 flows.
> mysql> select count(*) from owner;
> +----------+
> | count(*) |
> +----------+
> | 253285 |
> +----------+
> 1 row in set (0.00 sec)
>
> Sorry for misspell of gid. It is uuid of the flow. I think I should
> redesign the table. But how to solve the huge one-multiple
> relationship.
>
> The flow table is the only largest table in the database, the others
> are relative small just like the owner table size.
>
>
> I have a design. Create an additinal field in owner table to store the
> pk(or uuid) of flows owned by this user. Than use a store procedure to
> update this field called by changing to the flow table. But I think it
> is a bad design, right ? And also mysql 4.1.x do not support store
> procedure. I do not want to migrate the whole database, and I am sure
> there are some incompatible.
>
> Thanks...
> 


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=1


Thread
How to speed up query of indexed column with 5M rows?bowen9 Oct
  • Re: How to speed up query of indexed column with 5M rows?Dan Buettner9 Oct
    • Re: How to speed up query of indexed column with 5M rows?Jacques Marneweck9 Oct
      • Re: How to speed up query of indexed column with 5M rows?Rolando Edwards9 Oct
  • Re: How to speed up query of indexed column with 5M rows?Brent Baisley9 Oct
    • Re: How to speed up query of indexed column with 5M rows?Rolando Edwards9 Oct
Re: How to speed up query of indexed column with 5M rows?bowen10 Oct