From: Peter Brawley Date: July 14 2005 5:01pm Subject: Re: SQL 'clustering' query? List-Archive: http://lists.mysql.com/mysql/186507 Message-Id: <42D69A82.1080008@earthlink.net> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Dan, Hi, I remember reading about an SQL query type which did something like select all 'aircraft hangers' which contained exactly (or at least) some given set of aircraft. Or did it select the list of pilots qualified to fly all the airplanes in the hanger... Anyway, I forget the syntax (and the fancy name for this kind of query). I think the concept you're after is relational division. The aeroplane hangar was one of Celko's examples. Another from him is at http://www.artfulsoftware.com/queries.php#28, other examples at http://www.artfulsoftware.com/queries.php#22, http://www.artfulsoftware.com/queries.php#33, PB Dan Bolser wrote: >Hi, I remember reading about an SQL query type which did something like >select all 'aircraft hangers' which contained exactly (or at least) some >given set of aircraft. > >Or did it select the list of pilots qualified to fly all the airplanes in >the hanger... > >Anyway, I forget the syntax (and the fancy name for this kind of query). > >What I want to do is the following, given this data... > > >Table: ATTRIBUTE_LIST; > >ID ATTRIBUTE >W A >W B >W C >X A >X B >X C >Y A >Y B >Y C >Y D >Z E > >--> SQL MAGIC --> > > >Table: CLUSTERS > >G_ID ID >1 W >1 X >2 Y >3 Z > > >That is, to group together all ID's with the same 'set' of ATTRIBUTES. > >Currently I am doing this using 'GROUP_CONCAT', but my attribute list just >went above the limit for the GROUP_CONCAT column... > >+---------+------+--------------------------------------+ >| Level | Code | Message | >+---------+------+--------------------------------------+ >| Warning | 1260 | 7 line(s) were cut by GROUP_CONCAT() | >+---------+------+--------------------------------------+ > >My query looks roughly like this... > > >SET @i:=0, @x:='', @row:=''; ># >DROP TABLE CLUSTERS; >CREATE TABLE CLUSTERS > (PRIMARY KEY (ID), INDEX (G_ID)) ># >SELECT > ID, G_ID > # >FROM >( > SELECT > ID, > # > @x:= ATTR_LIST AS HIDDEN1, > # > IF(@row = @x, @i, @i:=@i+1) AS G_ID, > # > @row:= @x AS HIDDEN2 > # > FROM > ( > SELECT > ID, > GROUP_CONCAT(ATTRIBUTE) AS ATTR_LIST, > FROM > ATTRIBUTE_LIST > GROUP BY > ID > # > ) AS vt1 > # > ORDER BY -- This is very important for > ATTR_LIST -- the overall query. > # >) AS vt2; > >(And thats the highly simplified version!) > > >I can't shake the feeling that this 'string based' approach (while quite >speedy) is inherently messy, and that a proper 'set based' approach >should exist, and shouldn't have the limitation in the number of >attributes that the above method has. > >In general I would really like to (somehow) develop a suite of easy to use >'SQL CLUSTER' commands, as the data mining community needs that kind of >thing in nice general (set based) abundance :) > >Anyway, thanks for any feedback on any of the above, > >Dan. > > > > -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.323 / Virus Database: 267.8.15/49 - Release Date: 7/14/2005