List:General Discussion« Previous MessageNext Message »
From:Peter Brawley Date:March 29 2011 8:17pm
Subject:Re: A common request
View as plain text  
 > Why not optimize the IN ( ... ) to do the same type of thing?

If the argument to IN() is a list of values, it'll be OK. If it's a 
SELECT, in 5.0 it will be slower than molasses (see "The unbearable 
slowness of IN()" at http://www.artfulsoftware.com/queries.php.

 > I always tried to avoid joins because I am planning to horizontally 
partition my data.

A severe & unfortunate constraint. Can't help you there.

PB

-----

On 3/29/2011 1:27 PM, Gregory Magarshak wrote:
> Yes, this would be fine. But often, the list of friends is obtained 
> from a social network like facebook, and is not stored internally. 
> Basically, I obtain the friend list in a request to facebook, and then 
> see which of those users have created things. So would I have to 
> create a temporary table and insert all those uids just to make a 
> join? Why not optimize the IN ( ... ) to do the same type of thing?
>
> There is also a second problem: I want to use MySQL Cluster, because I 
> expect to have many users. Would it be efficient to use JOIN between 
> the friends table and the articles table? Both tables are partitioned 
> by user_id as the primary key, so the join would have to hit many 
> different nodes. I always tried to avoid joins because I am planning 
> to horizontally partition my data. But if MySQL cluster can handle 
> this join transparently and split it up based on the partition, then 
> that's fine. Do you have any info on this?
>
> Greg
>
> On 3/29/11 2:10 PM, Peter Brawley wrote:
>> > How can I quickly find all the articles written by this user's 
>> friends, and not just random articles?
>>
>> Taking the simplest possible case, with table 
>> friends(userID,friendID) where each friendID refers to a userID in 
>> another row, the friends of userID u are ...
>>
>> select friendID from user where userID=u;
>>
>> so articles by those friends of u are ...
>>
>> select a.* from article a join (  select friendID from user where 
>> userID=u ) f on a.userID=f.friendID;
>>
>> PB
>>
>> -----
>>
>> On 3/29/2011 12:50 PM, Gregory Magarshak wrote:
>>> Hey there. My company writes a lot of social applications, and there 
>>> is one operation that is very common, but I don't know if MySQL 
>>> supports it in a good way. I thought I'd write to this list for two 
>>> reasons:
>>>
>>>     1) Maybe MySQL has a good way to do this, and I just don't know 
>>> about it
>>>
>>>     2) Propose to MySQL developers a simple algorithm which would 
>>> greatly improve MySQL support for social networking apps.
>>>
>>>     Here is the situation. Let's say I have built a social 
>>> networking application where people create and edit some item 
>>> (article, photo, music mix, whatever). Now, a typical user logs in, 
>>> and this user has 3000 friends. How can I quickly find all the 
>>> articles written by this user's friends, and not just random articles?
>>>
>>>     Ideally, I would want to write something like this:
>>>
>>>     SELECT * FROM article WHERE user_id IN (345789, 324875, 398, 
>>> ..., 349580)
>>>
>>>     basically, execute a query with a huge IN ( ... ). Maybe if this 
>>> would exceed the buffer size for the MySQL wire protocol, I would 
>>> break up the list into several lists, and execute several queries, 
>>> and union the results together myself.
>>>
>>>     But my point is, this is very common for social networking apps. 
>>> Every app wants to show "the X created by your friends", or "friends 
>>> of yours (given some list from a social network) who have taken 
>>> action X".
>>>
>>>     Here is how I would do it if I had raw access to the MySQL index 
>>> in memory:
>>>
>>>     a) Sort the list of entries in the IN, in ascending order.
>>>
>>>     b) Do *ONE* binary search through the index (assuming it's a 
>>> BTREE index) and get them all in one pass. If it's a HASH index or 
>>> something, I would have to look up each one individually.
>>>
>>>     The benefits of this approach would be that this common 
>>> operation would be done extremely quickly. If the index fits 
>>> entirely in memory, and I just want to get the primary keys (i.e. 
>>> get the list of friends who did X), the disk isn't even touched. In 
>>> addition, for BTREE indexes, I would just need ONE binary search, 
>>> because the entries have been sorted in ascending order.
>>>
>>>     Does MySQL have something like this? And if not, perhaps you can 
>>> add it in the next version? It would really boost MySQL's support 
>>> for social networking apps tremendously. Alternative, how can I add 
>>> this to my MySQL? Any advice would be appreciated.
>>>
>>> Sincerely,
>>> Gregory Magarshak
>>> Qbix
>>>
>
>
Thread
A common requestGregory Magarshak29 Mar
  • Re: A common requestPeter Brawley29 Mar
    • Re: A common requestGregory Magarshak29 Mar
      • Re: A common requestPeter Brawley29 Mar
        • Re: A common requestGregory Magarshak31 Mar
          • Re: A common requestGregory Magarshak31 Mar
            • Re: A common requestJohan De Meersman31 Mar
          • Re: A common requestmos31 Mar
            • Re: A common requestJohan De Meersman31 Mar
            • Re: A common requestmos31 Mar
              • Re: A common requestWm Mussatto31 Mar
  • Re: A common requestSander de Bruijne29 Mar