List:General Discussion« Previous MessageNext Message »
From:Yashesh Bhatia Date:December 12 2007 1:35pm
Subject:Eliminating duplicates from self join results
View as plain text  
Hello:

    I have the following table

select * from addressbook
+----+--------------------+
| id | email              |
+----+--------------------+
|  1 | yasheshb@stripped |
|  2 | yasheshb@stripped |
|  3 | yasheshb@stripped |
+----+--------------------+
3 rows in set (0.00 sec)

Now i wanted to find a list of duplicate contacts wherein i can get
the 1st contact with same email
and merge with the others. so in the above case id 1 has duplicates 2
and 3, 2 has 1 & 3 and 3 and 1 & 2.

I'm only interested in getting the first set of duplicates i.e. "1 has
duplicates 2 & 3".

So i tried the query
select t1.id as id1, t2.id as id2, t1.email as email1, t2.email as email2
from addressbook t1, addressbook t2
where t1.email = t2.email and t1.id != t2.id
order by t1.id
+-----+-----+--------------------+--------------------+
| id1 | id2 | email1             | email2             |
+-----+-----+--------------------+--------------------+
|   1 |   2 | yasheshb@stripped | yasheshb@stripped |
|   1 |   3 | yasheshb@stripped | yasheshb@stripped |
|   2 |   1 | yasheshb@stripped | yasheshb@stripped |
|   2 |   3 | yasheshb@stripped | yasheshb@stripped |
|   3 |   1 | yasheshb@stripped | yasheshb@stripped |
|   3 |   2 | yasheshb@stripped | yasheshb@stripped |
+-----+-----+--------------------+--------------------+

then i tried the query similar to the one suggested in the MySQL
Cookbook Recipe 14.5

select DISTINCT if(t1.id < t2.id, t1.id, t2.id) as id1,
       if(t1.id < t2.id, t1.email, t2.email) as email1,
       if(t1.id < t2.id, t2.id, t1.id) as id2,
       if(t1.id < t2.id, t2.email, t1.email) as email2
from addressbook t1, addressbook t2
where t1.email = t2.email and t1.id != t2.id
order by t1.id
+-----+--------------------+-----+--------------------+
| id1 | email1             | id2 | email2             |
+-----+--------------------+-----+--------------------+
|   1 | yasheshb@stripped |   2 | yasheshb@stripped |
|   1 | yasheshb@stripped |   3 | yasheshb@stripped |
|   2 | yasheshb@stripped |   3 | yasheshb@stripped |
+-----+--------------------+-----+--------------------+

I'm stuck trying to get a query that will give me only
+-----+--------------------+-----+--------------------+
| id1 | email1             | id2 | email2             |
+-----+--------------------+-----+--------------------+
|   1 | yasheshb@stripped |   2 | yasheshb@stripped |
|   1 | yasheshb@stripped |   3 | yasheshb@stripped |
+-----+--------------------+-----+--------------------+

Any help, feeback is deeply appreciated. Thanks a bunch in advance.

Yashesh Bhatia
Thread
Eliminating duplicates from self join resultsYashesh Bhatia12 Dec
  • Re: Eliminating duplicates from self join resultsJoerg Bruehe12 Dec
    • Re: Eliminating duplicates from self join resultsJoerg Bruehe13 Dec
  • Re: Eliminating duplicates from self join resultsBrent Baisley13 Dec