> Why not optimize the IN ( ... ) to do the same type of thing?
If the argument to IN() is a list of values, it'll be OK. If it's a
SELECT, in 5.0 it will be slower than molasses (see "The unbearable
slowness of IN()" at http://www.artfulsoftware.com/queries.php.
> I always tried to avoid joins because I am planning to horizontally
partition my data.
A severe & unfortunate constraint. Can't help you there.
PB
-----
On 3/29/2011 1:27 PM, Gregory Magarshak wrote:
> Yes, this would be fine. But often, the list of friends is obtained
> from a social network like facebook, and is not stored internally.
> Basically, I obtain the friend list in a request to facebook, and then
> see which of those users have created things. So would I have to
> create a temporary table and insert all those uids just to make a
> join? Why not optimize the IN ( ... ) to do the same type of thing?
>
> There is also a second problem: I want to use MySQL Cluster, because I
> expect to have many users. Would it be efficient to use JOIN between
> the friends table and the articles table? Both tables are partitioned
> by user_id as the primary key, so the join would have to hit many
> different nodes. I always tried to avoid joins because I am planning
> to horizontally partition my data. But if MySQL cluster can handle
> this join transparently and split it up based on the partition, then
> that's fine. Do you have any info on this?
>
> Greg
>
> On 3/29/11 2:10 PM, Peter Brawley wrote:
>> > How can I quickly find all the articles written by this user's
>> friends, and not just random articles?
>>
>> Taking the simplest possible case, with table
>> friends(userID,friendID) where each friendID refers to a userID in
>> another row, the friends of userID u are ...
>>
>> select friendID from user where userID=u;
>>
>> so articles by those friends of u are ...
>>
>> select a.* from article a join ( select friendID from user where
>> userID=u ) f on a.userID=f.friendID;
>>
>> PB
>>
>> -----
>>
>> On 3/29/2011 12:50 PM, Gregory Magarshak wrote:
>>> Hey there. My company writes a lot of social applications, and there
>>> is one operation that is very common, but I don't know if MySQL
>>> supports it in a good way. I thought I'd write to this list for two
>>> reasons:
>>>
>>> 1) Maybe MySQL has a good way to do this, and I just don't know
>>> about it
>>>
>>> 2) Propose to MySQL developers a simple algorithm which would
>>> greatly improve MySQL support for social networking apps.
>>>
>>> Here is the situation. Let's say I have built a social
>>> networking application where people create and edit some item
>>> (article, photo, music mix, whatever). Now, a typical user logs in,
>>> and this user has 3000 friends. How can I quickly find all the
>>> articles written by this user's friends, and not just random articles?
>>>
>>> Ideally, I would want to write something like this:
>>>
>>> SELECT * FROM article WHERE user_id IN (345789, 324875, 398,
>>> ..., 349580)
>>>
>>> basically, execute a query with a huge IN ( ... ). Maybe if this
>>> would exceed the buffer size for the MySQL wire protocol, I would
>>> break up the list into several lists, and execute several queries,
>>> and union the results together myself.
>>>
>>> But my point is, this is very common for social networking apps.
>>> Every app wants to show "the X created by your friends", or "friends
>>> of yours (given some list from a social network) who have taken
>>> action X".
>>>
>>> Here is how I would do it if I had raw access to the MySQL index
>>> in memory:
>>>
>>> a) Sort the list of entries in the IN, in ascending order.
>>>
>>> b) Do *ONE* binary search through the index (assuming it's a
>>> BTREE index) and get them all in one pass. If it's a HASH index or
>>> something, I would have to look up each one individually.
>>>
>>> The benefits of this approach would be that this common
>>> operation would be done extremely quickly. If the index fits
>>> entirely in memory, and I just want to get the primary keys (i.e.
>>> get the list of friends who did X), the disk isn't even touched. In
>>> addition, for BTREE indexes, I would just need ONE binary search,
>>> because the entries have been sorted in ascending order.
>>>
>>> Does MySQL have something like this? And if not, perhaps you can
>>> add it in the next version? It would really boost MySQL's support
>>> for social networking apps tremendously. Alternative, how can I add
>>> this to my MySQL? Any advice would be appreciated.
>>>
>>> Sincerely,
>>> Gregory Magarshak
>>> Qbix
>>>
>
>