List:General Discussion« Previous MessageNext Message »
From:Dan Bolser Date:July 14 2005 3:55pm
Subject:SQL 'clustering' query?
View as plain text  
Hi, I remember reading about an SQL query type which did something like
select all 'aircraft hangers' which contained exactly (or at least) some
given set of aircraft.

Or did it select the list of pilots qualified to fly all the airplanes in
the hanger... 

Anyway, I forget the syntax (and the fancy name for this kind of query).

What I want to do is the following, given this data...


Table: ATTRIBUTE_LIST;

ID	ATTRIBUTE
W	A
W	B
W	C
X       A
X	B
X	C
Y	A
Y	B
Y	C
Y	D
Z	E

--> SQL MAGIC --> 


Table: CLUSTERS

G_ID	ID
1	W
1	X
2	Y
3	Z


That is, to group together all ID's with the same 'set' of ATTRIBUTES.

Currently I am doing this using 'GROUP_CONCAT', but my attribute list just
went above the limit for the GROUP_CONCAT column...

+---------+------+--------------------------------------+
| Level   | Code | Message                              |
+---------+------+--------------------------------------+
| Warning | 1260 | 7 line(s) were cut by GROUP_CONCAT() |
+---------+------+--------------------------------------+

My query looks roughly like this...


SET @i:=0, @x:='', @row:='';
#
DROP    TABLE CLUSTERS;
CREATE  TABLE CLUSTERS
  (PRIMARY KEY (ID), INDEX (G_ID))
#
SELECT
  ID, G_ID
  #
FROM
(
  SELECT
    ID,
    #
    @x:= ATTR_LIST               AS HIDDEN1,
    #
    IF(@row = @x, @i, @i:=@i+1) AS G_ID,
    #
       @row:= @x                 AS HIDDEN2
    #
  FROM 
  (
    SELECT 
      ID,
      GROUP_CONCAT(ATTRIBUTE) AS ATTR_LIST,
    FROM
      ATTRIBUTE_LIST
    GROUP BY
      ID
    #
  ) AS vt1
  #
  ORDER BY          -- This is very important for
    ATTR_LIST       -- the overall query.
  #
) AS vt2;

(And thats the highly simplified version!)


I can't shake the feeling that this 'string based' approach (while quite
speedy) is inherently messy, and that a proper 'set based' approach
should exist, and shouldn't have the limitation in the number of
attributes that the above method has.

In general I would really like to (somehow) develop a suite of easy to use
'SQL CLUSTER' commands, as the data mining community needs that kind of
thing in nice general (set based) abundance :)

Anyway, thanks for any feedback on any of the above,

Dan.

Thread
SQL 'clustering' query?Dan Bolser14 Jul
  • Re: SQL 'clustering' query?Peter Brawley14 Jul
    • Re: SQL 'clustering' query?Dan Bolser15 Jul