List:General Discussion« Previous MessageNext Message »
From:David Kramer Date:July 30 2002 10:56pm
Subject:RE: removing duplicate records
View as plain text  
Vivian - Im just curious as to why I shouldnt trust group by?  Is there a
bug or something within Mysql regarding the group by clause?

Walt - you could also create a compound primary key that consists of all
columns, then create a unique constraint on the Primary key.....  I've used
this many times in the past, and it works quite well.  The only caveat is
the SQL will puke when it tries to load the duplicate records into the
table, I personally would use the DEDUP process of Insert/select with a
group by...  But you might want to see what Vivian's response is on the
group by issue.  

DK

-----Original Message-----
From: Vivian Wang [mailto:vivianwang@stripped]
Sent: Tuesday, July 30, 2002 2:37 PM
To: walt; David Kramer; mysql@stripped
Subject: Re: removing duplicate records


Walt,

Don't trust group by.
I am using mysqldump and sort -u on *nix for duplicate.

Try both ways, let me know your result.

Thanks
At 04:11 PM 7/30/2002 -0400, walt wrote:
>Thanks David!
>
>The entire row, not just one or two columns, is a duplicate which makes
life
>fun..
>:-)
>
>I can key or unique index only a few columns once the data is cleaned up to
>prevent this problem.
>
>If I create an identical table and include either a key or unique index
>(innodb seems to like the index better) on all the columns and do a
>
>insert into new_table using select * from old_table
>
>will mysql quit inserting once a duplicate is hit, or will it keep going
and
>skip over the duplicates?
>
>walt
>
>On Tuesday 30 July 2002 03:57 pm, David Kramer wrote:
> > You could always use an insert statement into a second table, when
> > performing the insert use a GROUP BY clause to consalidate your records
> >
> > something along these lines but this isnt 100% accurate, I would need
the
> > table DDL and business rules behind the DEDUP
> >
> > Insert into tableB
> > (
> >       column names, ...
> >
> >
> > )
> > (select
> >       column a,
> >       column b,
> >       max(column c), --or you could use MIN
> > from
> >       table A
> >
> > group by
> >       column a,
> >       column b);
> >
> >
> >
> >
> >
> > **JUST make sure your Identifing column, i.e. the column you use to tell
> > which is a duplicate record or not is included in the group by.  Also
what
> > are the business rules behind the DEDUP(Deduplication)? Are the other
> > values contained in the other columns necessary?  If you tell me more
about
> > what your trying to do and provide some Table DDL I can help you write
this
> > query.  Just let me know!
> >
> > Thanks,
> >
> > DK
> >
> >   group by statement
> >
> > -----Original Message-----
> > From: walt [mailto:walt@stripped]
> > Sent: Tuesday, July 30, 2002 12:43 PM
> > To: mysql@stripped
> > Subject: removing duplicate records
> >
> >
> > Does anyone know a good way find and remove duplicate records from a
table?
> > I can create an identical table structure and use a script to pull
records
> > from the existing table and insert them into the new table if they are
not
> > duplicates, but I'd rather not do it that way. Here is an example of an
sql
> > script  I use for Oracle databases
> >
> > delete   from employee a
> > where    rowid < (
> > select  max(rowid)
> > from    employee b
> > where   b.COL1 = a.COL1
> > and     b.COL2 = a.COL2
> > and     b.COL# = a.COL#);
> >
> > sql, query
> >
> > Thanks in advance!
>
>
>
>
>---------------------------------------------------------------------
>Before posting, please check:
>    http://www.mysql.com/manual.php   (the manual)
>    http://lists.mysql.com/           (the list archive)
>
>To request this thread, e-mail <mysql-thread115946@stripped>
>To unsubscribe, e-mail 
><mysql-unsubscribe-vivianwang=tcindex.com@stripped>
>Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Thread
removing duplicate recordswalt30 Jul
  • Re: removing duplicate recordsGerald Clark30 Jul
RE: removing duplicate recordsDavid Kramer30 Jul
  • Re: removing duplicate recordswalt31 Jul
    • Re: removing duplicate recordsVivian Wang31 Jul
    • RE: removing duplicate recordsGordon31 Jul
  • getting the maximum valuedestr031 Jul
  • Re: getting the maximum valueGerald Clark31 Jul
RE: removing duplicate recordsDavid Kramer31 Jul
RE: getting the maximum valueChris Kay31 Jul