----- Original Message -----
From: "Scott Hamm" <linuxgold@stripped>
To: "Mysql" <mysql@stripped>
Sent: Tuesday, April 05, 2005 10:05 AM
Subject: Duplicated records
> Here is my novice question:
> I need to find duplicates in "Batch" category when I issued this query:
> SELECT * FROM QA WHERE Batch=439584414;
> 1 2661 5334 JulieAnt 25 5 5 439584414 2004-10-03 00:00:00 2004-10-03
> 00:00:00 0 90575
> 1 5657 5334 JulieAnt 25 5 5 439584414 2004-10-03 00:00:00 2004-10-03
> How do I issue a query that finds duplicated Batch number?
First of all, I think you made a mistake when putting this data into your
note, a mistake which is compounded by the wrapping that the email program
did. I *think* you meant to display two rows with the '90577' value at the
end of the second row, not before the first row. This makes the example a
bit confusing and hard to follow.
The two rows you provide in your example are not duplicates of each other.
Even assuming that the '90577' actually belongs at the end of the second row
rather than the start of the first row, the two rows are different: the
first row ends in '90575', not '90577' and the second value in each row is
also different: '2661' and '5657'. Therefore, you're not really trying to
find duplicate records because a duplicate row would be one that is
identical in every column, not just identical in several columns. I hope
that doesn't sound like I am splitting hairs; maybe I am. ;-)
Anyway, the standard method for finding duplicates of rows goes like this
and can be adapted for your situation. Let's say that you have a table
called Foo with columns foo1, foo2, and foo3 and you want to find all of the
duplicate rows in the table, i.e. any row whose foo1, foo2, and foo3 value
is identical to the foo1, foo2, and foo3 value of another table. You would
write this query:
select foo1, foo2, foo3, count(*) as num
group by foo1, foo2, foo3
having num > 1
order by num;
Your result might look like this:
foo1 foo2 foo3 num
Daffy D Duck 1
Bugs B Bunny 1
Elmer A Fudd 3
This result means that Daffy D Duck and Bugs B Bunny each occur once in the
table but Elmer A Fudd occurs three times.
In the case you mentioned in your email, where the rows aren't complete
duplicates of one another, just write the query so that you omit the columns
which are allowed to be different. In your case, that would appear to be all
but the second and last columns.
However, I think if you get to the point of having duplicate records in a
table when you don't want them, finding them is like shutting the barn door
after the horse is already gone. I think you need to reconsider your design,
specifically your primary key, and change that primary key to *PREVENT* the
duplicate keys in the first place.
For instance, in the case of the Foo1 table, I can prevent duplicate rows by
choosing a good primary key; in this case I would choose the combination of
all three columns in the table; that would ensure that I never stored more
than 1 person named Elmer A Fudd. [Actually, that's not a great example! It
might be perfectly valid to have two people named Elmer A Fudd - or John A
Smith - so I probably need to add additional columns to my table to ensure
that the rows are unique; something like Social Security Number which is
(supposed to be) unique would be ideal for this purpose.]
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.308 / Virus Database: 266.9.2 - Release Date: 05/04/2005