MySQL Lists are EOL. Please join:

List:General Discussion« Previous MessageNext Message »
From:SGreen Date:July 14 2004 5:44pm
Subject:RE: Weeding out duplicates
View as plain text  

You _can_ do this in SQL, it just takes several steps.

What you need to do is create a table that contains the ID values of the
records you want to get rid of. If you want to keep the "first" record in a
set of dupes try this:

FROM yourtablenamehere t
GROUP BY email
HAVING count(1) >1;

ALTER TABLE DeleteMe add Key(ID);

This makes a full list of all IDs for all duplicate emails and puts an
index on it (you may not need the index if your dupes list  is small
enough). But you said that you don't want to get rid of all dupes, you want
to keep the "first" dupe in each set. Let's make another list:

FROM DeleteMe
GROUP BY email

This is a list of the IDs you want to keep. Now to eliminate the "keepers"
from the full dupes list.

FROM DeleteMe d
      on =
WHERE is null

You now have a list of all duplicate IDs except the "first" ones. We will
use this list to clear the dupes from your original table:

DELETE yourtablenamehere
FROM yourtablenamehere t
INNER JOIN Deleteme d
      ON =

Now, all that's left to do is to clean up our working tables:

DROP TABLE KeepMe, DeleteMe

And you are done!

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

                      "Robert A.                                                          
                      Rosenberg"               To:       <mysql@stripped>    
                      <MySQL@stripped        cc:                                     
                      >                        Fax to:                                 
                                               Subject:  RE: Weeding out duplicates       
                      07/10/2004 10:05                                                    

At 10:25 -0600 on 07/09/2004, Jonathan Duncan wrote about Re: Weeding
out duplicates:

>I want to identify the entries in the table where the email addresses
>are the same as another entry.  Whatever else is in the record does not
>matter to me.
>However, a second requirement for the query is that it show me the last
>duplicate instead of the first.  This way I keep the first entries and
>remove subsequent ones.
>Jonathan Duncan

If you are willing to go with a PHP/MySQL solution as opposed to a
pure MySQL one, try this:

  1) Use that query to get a list of what Email Addresses are duplicated.
  2) Now do a query with a WHERE Email=Duplicate-Email-Address and
fetch the Primary Key.
  3) Do a Delete by Primary Key for all rows fetched in 2 EXCEPT for
those keys fetched in 1.

The sequence is that you do step 1, loop though the results one at a
time (step 2) doing step 3 in that loop.

MySQL General Mailing List
For list archives:
To unsubscribe:

Weeding out duplicatesJonathan Duncan8 Jul
  • RE: Weeding out duplicatesLachlan Mulcahy9 Jul
Re: Weeding out duplicatesjeffrey_n_Dyke9 Jul
Re: Weeding out duplicatesJonathan Duncan9 Jul
RE: Weeding out duplicatesJonathan Duncan9 Jul
  • RE: Weeding out duplicatesRobert A. Rosenberg11 Jul
RE: Weeding out duplicatesJonathan Duncan10 Jul
  • RE: Weeding out duplicatesRobert A. Rosenberg11 Jul
RE: Weeding out duplicatesJonathan Duncan12 Jul
RE: Weeding out duplicatesSGreen14 Jul