List:General Discussion« Previous MessageNext Message »
From:mos Date:March 31 2011 7:33pm
Subject:Re: A common request
View as plain text  
At 11:20 AM 3/31/2011, you wrote:
>At 07:29 AM 3/31/2011, you wrote:
>>Thanks for your insight! But I'm still worried about the performance of 
>>IN ( big list of values ). Can you tell me how it is implemented?
>>
>>Suppose I have SELECT a FROM b WHERE c IN (1, 4, 5, 117, 118, 119, ..., 
>>387945)
>>
>>1) If I put 200 values there, does it do 200 individual SELECTs 
>>internally, and union them?
>
>No. It uses one Select statement.
>
>>Or does it notice that c has a "UNIQUE" index and thus at most one row 
>>can be returned per SELECT, and does them all at once?
>
>The IN() clause is very inefficient because MySQL will NOT use the index. 
>It will have to traverse the entire table looking for these values. That 
>is why a table join will be much faster than using IN().

Oops. Sorry, the In() clause in MySQL 5.5 does use the index (not sure when 
they implemented that).  Even so, I still find it slow when the In() clause 
has a lot of elements.
I should have used an Explain in front of my Select statement to see if the 
index is used before I posted.

Mike



>>2) If I want to get just the primary key, or join with another table 
>>based on just the primary key, does this query ever touch the disk 
>>(assuming the index is in memory, which I think it always is -- correct 
>>me if I'm wrong about that).
>
>It will get the information from the index and not have to access the 
>record data from disk. If the index is stored in memory, then it won't 
>have to go to disk (unless you also have a sort). That is why the query 
>cache is so important.
>
>
>>The way I would recommend doing it (for BTREE indexes, anyway) is to sort 
>>the values in ascending order, and do the search in one pass through the 
>>index. The index is already in memory, and it would be straightforward to 
>>modify a binary search algorithm to find the rows corresponding to 
>>monotonically ascending values of the primary key, all in one pass.
>>
>>Even if the binary search algorithm is run 200 or 2000 times for a list, 
>>it would still be faster than hitting the disk. (Even though the CPU 
>>cache performance would be worse.)
>>
>>Can you let me know the specifics of it, and especially how I can avoid 
>>hitting the I/O bottlenecks?
>
>Use a table join and make sure you have the indexes loaded into memory. 
>See http://dev.mysql.com/doc/refman/5.0/en/load-index.html. If using 
>InnoDb then its index cache scheme is quite good.
>
>
>Mike
>
>
>>Thank you,
>>Greg
>>
>>On 3/29/11 4:17 PM, Peter Brawley wrote:
>>> > Why not optimize the IN ( ... ) to do the same type of thing?
>>>
>>>If the argument to IN() is a list of values, it'll be OK. If it's a 
>>>SELECT, in 5.0 it will be slower than molasses (see "The unbearable 
>>>slowness of IN()" at http://www.artfulsoftware.com/queries.php.
>>>
>>> > I always tried to avoid joins because I am planning to horizontally 
>>> partition my data.
>>>
>>>A severe & unfortunate constraint. Can't help you there.
>>>
>>>PB
>>>
>>>-----
>>>
>>>On 3/29/2011 1:27 PM, Gregory Magarshak wrote:
>>>>Yes, this would be fine. But often, the list of friends is obtained 
>>>>from a social network like facebook, and is not stored internally. 
>>>>Basically, I obtain the friend list in a request to facebook, and then 
>>>>see which of those users have created things. So would I have to create 
>>>>a temporary table and insert all those uids just to make a join? Why 
>>>>not optimize the IN ( ... ) to do the same type of thing?
>>>>
>>>>There is also a second problem: I want to use MySQL Cluster, because I 
>>>>expect to have many users. Would it be efficient to use JOIN between 
>>>>the friends table and the articles table? Both tables are partitioned 
>>>>by user_id as the primary key, so the join would have to hit many 
>>>>different nodes. I always tried to avoid joins because I am planning to 
>>>>horizontally partition my data. But if MySQL cluster can handle this 
>>>>join transparently and split it up based on the partition, then that's 
>>>>fine. Do you have any info on this?
>>>>
>>>>Greg
>>>>
>>>>On 3/29/11 2:10 PM, Peter Brawley wrote:
>>>>> > How can I quickly find all the articles written by this user's 
>>>>> friends, and not just random articles?
>>>>>
>>>>>Taking the simplest possible case, with table friends(userID,friendID)
> 
>>>>>where each friendID refers to a userID in another row, the friends of
> 
>>>>>userID u are ...
>>>>>
>>>>>select friendID from user where userID=u;
>>>>>
>>>>>so articles by those friends of u are ...
>>>>>
>>>>>select a.* from article a join (  select friendID from user where 
>>>>>userID=u ) f on a.userID=f.friendID;
>>>>>
>>>>>PB
>>>>>
>>>>>-----
>>>>>
>>>>>On 3/29/2011 12:50 PM, Gregory Magarshak wrote:
>>>>>>Hey there. My company writes a lot of social applications, and
> there 
>>>>>>is one operation that is very common, but I don't know if MySQL 
>>>>>>supports it in a good way. I thought I'd write to this list for
> two reasons:
>>>>>>
>>>>>>     1) Maybe MySQL has a good way to do this, and I just don't
> know 
>>>>>> about it
>>>>>>
>>>>>>     2) Propose to MySQL developers a simple algorithm which would
> 
>>>>>> greatly improve MySQL support for social networking apps.
>>>>>>
>>>>>>     Here is the situation. Let's say I have built a social 
>>>>>> networking application where people create and edit some item 
>>>>>> (article, photo, music mix, whatever). Now, a typical user logs
> in, 
>>>>>> and this user has 3000 friends. How can I quickly find all the 
>>>>>> articles written by this user's friends, and not just random
> articles?
>>>>>>
>>>>>>     Ideally, I would want to write something like this:
>>>>>>
>>>>>>     SELECT * FROM article WHERE user_id IN (345789, 324875, 398,
> 
>>>>>> ..., 349580)
>>>>>>
>>>>>>     basically, execute a query with a huge IN ( ... ). Maybe if
> this 
>>>>>> would exceed the buffer size for the MySQL wire protocol, I would
> 
>>>>>> break up the list into several lists, and execute several
> queries, 
>>>>>> and union the results together myself.
>>>>>>
>>>>>>     But my point is, this is very common for social networking
> apps. 
>>>>>> Every app wants to show "the X created by your friends", or
> "friends 
>>>>>> of yours (given some list from a social network) who have taken
> action X".
>>>>>>
>>>>>>     Here is how I would do it if I had raw access to the MySQL
> index 
>>>>>> in memory:
>>>>>>
>>>>>>     a) Sort the list of entries in the IN, in ascending order.
>>>>>>
>>>>>>     b) Do *ONE* binary search through the index (assuming it's a
> 
>>>>>> BTREE index) and get them all in one pass. If it's a HASH index
> or 
>>>>>> something, I would have to look up each one individually.
>>>>>>
>>>>>>     The benefits of this approach would be that this common 
>>>>>> operation would be done extremely quickly. If the index fits 
>>>>>> entirely in memory, and I just want to get the primary keys (i.e.
> 
>>>>>> get the list of friends who did X), the disk isn't even touched.
> In 
>>>>>> addition, for BTREE indexes, I would just need ONE binary search,
> 
>>>>>> because the entries have been sorted in ascending order.
>>>>>>
>>>>>>     Does MySQL have something like this? And if not, perhaps you
> can 
>>>>>> add it in the next version? It would really boost MySQL's support
> 
>>>>>> for social networking apps tremendously. Alternative, how can I
> add 
>>>>>> this to my MySQL? Any advice would be appreciated.
>>>>>>
>>>>>>Sincerely,
>>>>>>Gregory Magarshak
>>>>>>Qbix
>>
>>
>>--
>>MySQL General Mailing List
>>For list archives: http://lists.mysql.com/mysql
>>To unsubscribe:    http://lists.mysql.com/mysql?unsub=1
>
>
>--
>MySQL General Mailing List
>For list archives: http://lists.mysql.com/mysql
>To unsubscribe:    http://lists.mysql.com/mysql?unsub=1

Thread
A common requestGregory Magarshak29 Mar
  • Re: A common requestPeter Brawley29 Mar
    • Re: A common requestGregory Magarshak29 Mar
      • Re: A common requestPeter Brawley29 Mar
        • Re: A common requestGregory Magarshak31 Mar
          • Re: A common requestGregory Magarshak31 Mar
            • Re: A common requestJohan De Meersman31 Mar
          • Re: A common requestmos31 Mar
            • Re: A common requestJohan De Meersman31 Mar
            • Re: A common requestmos31 Mar
              • Re: A common requestWm Mussatto31 Mar
  • Re: A common requestSander de Bruijne29 Mar