List:General Discussion« Previous MessageNext Message »
From:Dan Buettner Date:May 10 2006 4:58pm
Subject:Re: slow query
View as plain text  
I think you are right about O log N performance when finding matching 
records within the index - however, the index doesn't contain all the 
data, so the mysql server has to hit the table as stored on disk to find 
what you were actually asking for (select *).  That becomes time 
consuming as it sifts through all the data from disk to retrieve 
matching records starting and ending at the right points.  Note in your 
EXPLAIN output that MySQL thinks it will have to examine about 506222 
rows to find the matching one(s); that's a pretty large number of rows 
to examine (half your table).

That's where having a wider distribution of user_id values would help, I 
think.  Of course, you haven't got that wider distribution, so that's a 
pointless discussion (I suggested looking for other user_ids earlier 
merely to confirm whether such queries were faster).

Not that this solves your problem, but try
SELECT user_id, fullname
FROM contacts WHERE user_id=1 ORDER BY fullname
LIMIT 1 OFFSET 500000;
as this should not hit the table on disk, only the index, since the 
user_id_2 index IS the data for this query.  I'm more familiar with 
MyISAM than InnoDB but I think this holds true for both table types.

Dan



Adam Wolff wrote:
> Thanks for the response, Dan. I did try ORDER BY on the table. Didn't help 
> -- I presume because the query is using an index.
> 
> Unfortunately, the point of my current development is to show searches 
> against millions of contacts, so the suggestion about working with other 
> user_ids isn't too practical.
> 
> I will look into increasing the size of my data cache.
> 
> I guess what surprises me is that I thought that the index was stored as a 
> BTree in sort order. I'm pretty bad with big-O, but I thought this would 
> suggest O log N performance to find a given offset within the index.
> 
> A
> 
> On May 10, Dan Buettner wrote:
> 
>> I would expect the problem to be that the further down in the data you go by
>> using OFFSET, the more records the mysql server has to scan in order to get to
>> what you want.  This will produce a fairly linear slowdown the further in you
>> go - it just takes time to check through 1,000,000 matching records.
>> Especially on desktop grade hardware where you probably haven't got the
>> fastest disk subsystem.
>>
>> I think in this case your slow searches may be a result of the heavy bias in
>> your data toward user_id 1.  Try your search on some of your other user_ids
>> and see.  With so many records for the same user_id, your search for that
>> user_id is necessitating something pretty close to a table scan, even though
>> it's hitting an index.
>>
>> Some suggestions would be to increase the size of your data cache, so that
>> after your first queries, the data (or more of it) is in memory. Assuming
>> you'll be deploying on server hardware, a faster disk system should help quite
>> a bit too.  Memory caches on hardware RAID systems can help with this kind of
>> thing too.
>>
>> You might also try
>> ALTER TABLE contacts ORDER BY user_id, fullname
>> to get your data sorted into the same order you're looking through it, though
>> it may well affect other queries you need to run against the same data.  I'm
>> not certain whether you can ORDER BY more than one column:
>> http://dev.mysql.com/doc/refman/5.0/en/alter-table.html
>> Also note that as you add or delete rows the table does not stay in order.
>>
>> Hope this helps!
>>
>> Dan
>>
>>
>> Adam Wolff wrote:
>>> I have a very simple table that looks like this:
>>> CREATE TABLE `contacts` (
>>> `id` int(11) NOT NULL auto_increment,
>>> `fullname` varchar(100) default NULL,
>>> `user_id` int(11) NOT NULL,
>>> PRIMARY KEY  (`id`),
>>> KEY `user_id` (`user_id`),
>>> KEY `user_id_2` (`user_id`,`fullname`),
>>> CONSTRAINT `contacts_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `user`
>>> (`id`)
>>> ENGINE=InnoDB DEFAULT CHARSET=utf8
>>>
>>>
>>> It's a bit of a lopsided table in that of the 1,000,100 records in the db,
>>> 1,000,000 of them belong to user_id 1. But I wouldn't expect this to
>>> skew my results.
>>>
>>> I am writing a little paging server that retrieves pages of data using
>>> LIMIT and OFFSET.
>>>
>>> I'm really surprised by how slowly my queries are running on a
>>> relatively fast desktop machine. Records near the top of the list are
>>> fine:
>>> mysql> SELECT * FROM contacts WHERE user_id=1 ORDER BY fullname
>>>       LIMIT 1 OFFSET 0;
>>> +--------+--------------+-----------------------------+---------+----------+
> 
>>> | id     | fullname     | email                       | user_id | nickname |
>>> +--------+--------------+-----------------------------+---------+----------+
> 
>>> | 371543 | Aaron Abbott | Abbott_Aaron@stripped |       1 | aaronab  |
>>> +--------+--------------+-----------------------------+---------+----------+
> 
>>> 1 row in set (0.03 sec)
>>>
>>> But as I move down the list, queries run slower and slower:
>>> mysql> SELECT * FROM contacts WHERE user_id=1 ORDER BY fullname
>>>       LIMIT 1 OFFSET 100000;
>>> +--------+--------------+-----------------------------+---------+----------+
> 
>>> | id     | fullname     | email                       | user_id | nickname |
>>> +--------+--------------+-----------------------------+---------+----------+
> 
>>> | 726543 | Benny Abbott | Abbott_Benny@stripped |       1 | bennyab  |
>>> +--------+--------------+-----------------------------+---------+----------+
> 
>>> 1 row in set (2.94 sec)
>>>
>>> mysql> SELECT * FROM contacts WHERE user_id=1 ORDER BY fullname
>>>       LIMIT 1 OFFSET 500000;
>>>
> +--------+---------------+------------------------------+---------+----------+ 
>>> | id     | fullname      | email                        | user_id | nickname
>>> |
>>>
> +--------+---------------+------------------------------+---------+----------+ 
>>> | 309543 | Jimmie Abbott | Abbott_Jimmie@stripped |       1 | jimmieab
>>> |
>>>
> +--------+---------------+------------------------------+---------+----------+ 
>>> 1 row in set (12.75 sec)
>>>
>>> EXPLAIN says:
>>>
> +----+-------------+----------+------+-------------------+-----------+---------+-------+--------+-------------+
> 
>>> | id | select_type | table    | type | possible_keys     | key       |
>>> key_len | ref   | rows   | Extra       |
>>>
> +----+-------------+----------+------+-------------------+-----------+---------+-------+--------+-------------+
> 
>>> |  1 | SIMPLE      | contacts | ref  | user_id,user_id_2 | user_id_2 |
>>> 4       | const | 506222 | Using where |
>>>
> +----+-------------+----------+------+-------------------+-----------+---------+-------+--------+-------------+
> 
>>>
>>> In other words, it *is* using an index for this query. Anyone have any
>>> advice for me?
>>>
>>> Thanks,
>>> Adam
>>>
>>
> 
Thread
slow queryAdam Wolff10 May
  • Re: slow queryDan Buettner10 May
    • Re: slow queryAdam Wolff10 May
      • Re: slow queryDan Buettner10 May