From: Peter Brawley Date: March 29 2011 8:17pm Subject: Re: A common request List-Archive: http://lists.mysql.com/mysql/224717 Message-Id: <4D923E59.4050809@earthlink.net> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit > Why not optimize the IN ( ... ) to do the same type of thing? If the argument to IN() is a list of values, it'll be OK. If it's a SELECT, in 5.0 it will be slower than molasses (see "The unbearable slowness of IN()" at http://www.artfulsoftware.com/queries.php. > I always tried to avoid joins because I am planning to horizontally partition my data. A severe & unfortunate constraint. Can't help you there. PB ----- On 3/29/2011 1:27 PM, Gregory Magarshak wrote: > Yes, this would be fine. But often, the list of friends is obtained > from a social network like facebook, and is not stored internally. > Basically, I obtain the friend list in a request to facebook, and then > see which of those users have created things. So would I have to > create a temporary table and insert all those uids just to make a > join? Why not optimize the IN ( ... ) to do the same type of thing? > > There is also a second problem: I want to use MySQL Cluster, because I > expect to have many users. Would it be efficient to use JOIN between > the friends table and the articles table? Both tables are partitioned > by user_id as the primary key, so the join would have to hit many > different nodes. I always tried to avoid joins because I am planning > to horizontally partition my data. But if MySQL cluster can handle > this join transparently and split it up based on the partition, then > that's fine. Do you have any info on this? > > Greg > > On 3/29/11 2:10 PM, Peter Brawley wrote: >> > How can I quickly find all the articles written by this user's >> friends, and not just random articles? >> >> Taking the simplest possible case, with table >> friends(userID,friendID) where each friendID refers to a userID in >> another row, the friends of userID u are ... >> >> select friendID from user where userID=u; >> >> so articles by those friends of u are ... >> >> select a.* from article a join ( select friendID from user where >> userID=u ) f on a.userID=f.friendID; >> >> PB >> >> ----- >> >> On 3/29/2011 12:50 PM, Gregory Magarshak wrote: >>> Hey there. My company writes a lot of social applications, and there >>> is one operation that is very common, but I don't know if MySQL >>> supports it in a good way. I thought I'd write to this list for two >>> reasons: >>> >>> 1) Maybe MySQL has a good way to do this, and I just don't know >>> about it >>> >>> 2) Propose to MySQL developers a simple algorithm which would >>> greatly improve MySQL support for social networking apps. >>> >>> Here is the situation. Let's say I have built a social >>> networking application where people create and edit some item >>> (article, photo, music mix, whatever). Now, a typical user logs in, >>> and this user has 3000 friends. How can I quickly find all the >>> articles written by this user's friends, and not just random articles? >>> >>> Ideally, I would want to write something like this: >>> >>> SELECT * FROM article WHERE user_id IN (345789, 324875, 398, >>> ..., 349580) >>> >>> basically, execute a query with a huge IN ( ... ). Maybe if this >>> would exceed the buffer size for the MySQL wire protocol, I would >>> break up the list into several lists, and execute several queries, >>> and union the results together myself. >>> >>> But my point is, this is very common for social networking apps. >>> Every app wants to show "the X created by your friends", or "friends >>> of yours (given some list from a social network) who have taken >>> action X". >>> >>> Here is how I would do it if I had raw access to the MySQL index >>> in memory: >>> >>> a) Sort the list of entries in the IN, in ascending order. >>> >>> b) Do *ONE* binary search through the index (assuming it's a >>> BTREE index) and get them all in one pass. If it's a HASH index or >>> something, I would have to look up each one individually. >>> >>> The benefits of this approach would be that this common >>> operation would be done extremely quickly. If the index fits >>> entirely in memory, and I just want to get the primary keys (i.e. >>> get the list of friends who did X), the disk isn't even touched. In >>> addition, for BTREE indexes, I would just need ONE binary search, >>> because the entries have been sorted in ascending order. >>> >>> Does MySQL have something like this? And if not, perhaps you can >>> add it in the next version? It would really boost MySQL's support >>> for social networking apps tremendously. Alternative, how can I add >>> this to my MySQL? Any advice would be appreciated. >>> >>> Sincerely, >>> Gregory Magarshak >>> Qbix >>> > >