List:General Discussion« Previous MessageNext Message »
From:Rhino Date:April 5 2005 4:43pm
Subject:Re: Duplicated records
View as plain text  
Oh, that's an entirely different problem than the one I understood from your
original question.

Your new explanation is clearer but I'm still not sure I follow all of it.
The formatting/wrapping of the example makes things a bit fuzzy too. Am I
right in assuming that on the first row of the result, the OperatorID is
2661, the QAID is 5334 and the NTID is JulieAnt? If so, is JulieAnt the
reviewer? Also are the QAID and the NTID different ways of saying the same
thing, such as the fact that JulieAnt's employee number is 5334, or is it
just a coincidence that they are the same on both rows?

I'm assuming that a Batch has two different operators if one shift starts
the batch but the batch isn't finished by the time the shift ends so a
second operator finishes the Batch. Will you ever have cases where a Batch
takes so long that it takes more than two shifts to finish it? For instance,
would you ever see a case where the company gets an order for a billion
grapple grommets and it takes 9 shifts to finish it? I'm trying to figure
out if you also need to worry about batches that are split over more than
two shifts.

Does the Batch value uniquely identify a particular Batch or is it the
combination of Batch and KeyDate?

Ok, assuming for the moment that Batch alone uniquely identifies a
particular Batch, and that you care about Batches that are split across two
*or more* shifts, you need a query like this to find those Batches:

select Batch, count(*) as count
from QA
group by Batch
having count >= 2;

That will return the Batch numbers alone.

Then, you simply do a second Select to get the other properties of the Batch
that you care about. That query would look very much like the one in your
example except that you would do it like so:

SELECT ID, Batch, KeyDate, OperatorID, QAID, NTID
 FROM QA
 WHERE Batch in ('439584414', '123456', '999444');

In other words, you'd put a list consisting of all of the values returned
from the one that got the Batch numbers into the IN clause.

Now, that could be a little tedious if there were a lot of 'split' batches
and you may also run into an issue if the number of of values in the IN
clause exceeded MySQL's limit, whatever that is. (Or maybe MySQL has a limit
on the maximum *length* of the IN clause; if it does, that could bite you
too.) To get around that, you could just repeat the select with the IN
clause for subsets of the results from the first query or even just do your
original query once for each of the values returned by the counting query.

A much better alternative would be to use a subquery but that is only an
option if you are running MySQL 4.1 or later. That would let you combine
both queries together into a single big query. Are you running V4.1 or
later? If you are, I could take a stab at the combined query....

Rhino


----- Original Message ----- 
From: "Scott Hamm" <linuxgold@stripped>
To: "Rhino" <rhino1@stripped>
Cc: "Mysql" <mysql@stripped>
Sent: Tuesday, April 05, 2005 11:14 AM
Subject: Re: Duplicated records


> Sorry for the confusion.  I don't mean a duplicated records, but to
> find duplicated Batch where two different operators worked on a single
> batch (one started off, then another one to finish the batch) and a
> single reviewer to review a batch. I need a list that lists duplicated
> Batches with different operators that worked on that batch.
>
> For example: (for clarification)
> SELECT ID, Batch, KeyDate, OperatorID, QAID, NTID
> FROM QA
> WHERE Batch='439584414';
>
> ID         Batch              KeyDate                      OperatorID
> QAID      NTID
> 90577    439584414 2004-10-03 00:00:00 2661       5334 JulieAnt
> 90575    439584414 2004-10-03 00:00:00 5657       5334 JulieAnt
>
>
> On Apr 5, 2005 10:54 AM, Rhino <rhino1@stripped> wrote:
> >
> > ----- Original Message -----
> > From: "Scott Hamm" <linuxgold@stripped>
> > To: "Mysql" <mysql@stripped>
> > Sent: Tuesday, April 05, 2005 10:05 AM
> > Subject: Duplicated records
> >
> > > Here is my novice question:
> > > I need to find duplicates in "Batch" category when I issued this
query:
> > >
> > > SELECT * FROM QA WHERE Batch=439584414;
> > >
> > > Result:
> > > 90577
> > > 1 2661 5334 JulieAnt 25 5 5 439584414 2004-10-03 00:00:00 2004-10-03
> > > 00:00:00 0 90575
> > > 1 5657 5334 JulieAnt 25 5 5 439584414 2004-10-03 00:00:00 2004-10-03
> > 00:00:00 0
> > >
> > > How do I issue a query that finds duplicated Batch number?
> > >
> > First of all, I think you made a mistake when putting this data into
your
> > note, a mistake which is compounded by the wrapping that the email
program
> > did. I *think* you meant to display two rows with the '90577' value at
the
> > end of the second row, not before the first row. This makes the example
a
> > bit confusing and hard to follow.
> >
> > The two rows you provide in your example are not duplicates of each
other.
> > Even assuming that the '90577' actually belongs at the end of the second
row
> > rather than the start of the first row, the two rows are different: the
> > first row ends in '90575', not '90577' and the second value in each row
is
> > also different: '2661' and '5657'. Therefore, you're not really trying
to
> > find duplicate records because a duplicate row would be one that is
> > identical in every column, not just identical in several columns. I hope
> > that doesn't sound like I am splitting hairs; maybe I am. ;-)
> >
> > Anyway, the standard method for finding duplicates of rows goes like
this
> > and can be adapted for your situation. Let's say that you have a table
> > called Foo with columns foo1, foo2, and foo3 and you want to find all of
the
> > duplicate rows in the table, i.e. any row whose foo1, foo2, and foo3
value
> > is identical to the foo1, foo2, and foo3 value of another table. You
would
> > write this query:
> >
> >     select foo1, foo2, foo3, count(*) as num
> >     from Foo1
> >     group by foo1, foo2, foo3
> >     having num > 1
> >     order by num;
> >
> > Your result might look like this:
> >
> >     foo1    foo2    foo3    num
> >     Daffy    D        Duck    1
> >     Bugs    B        Bunny    1
> >     Elmer    A        Fudd    3
> >
> > This result means that Daffy D Duck and Bugs B Bunny each occur once in
the
> > table but Elmer A Fudd occurs three times.
> >
> > In the case you mentioned in your email, where the rows aren't complete
> > duplicates of one another, just write the query so that you omit the
columns
> > which are allowed to be different. In your case, that would appear to be
all
> > but the second and last columns.
> >
> > However, I think if you get to the point of having duplicate records in
a
> > table when you don't want them, finding them is like shutting the barn
door
> > after the horse is already gone. I think you need to reconsider your
design,
> > specifically your primary key, and change that primary key to *PREVENT*
the
> > duplicate keys in the first place.
> >
> > For instance, in the case of the Foo1 table, I can prevent duplicate
rows by
> > choosing a good primary key; in this case I would choose the combination
of
> > all three columns in the table; that would ensure that I never stored
more
> > than 1 person named Elmer A Fudd. [Actually, that's not a great example!
It
> > might be perfectly valid to have two people named Elmer A Fudd - or John
A
> > Smith - so I probably need to add additional columns to my table to
ensure
> > that the rows are unique; something like Social Security Number which is
> > (supposed to be) unique would be ideal for this purpose.]
> >
> > Rhino
> >
> > --
> > No virus found in this outgoing message.
> > Checked by AVG Anti-Virus.
> > Version: 7.0.308 / Virus Database: 266.9.2 - Release Date: 05/04/2005
> >
> >
>
>
> -- 
> Power to people, Linux is here.
>
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql?unsub=1
>
>
> -- 
> No virus found in this incoming message.
> Checked by AVG Anti-Virus.
> Version: 7.0.308 / Virus Database: 266.9.2 - Release Date: 05/04/2005
>
>



-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.308 / Virus Database: 266.9.2 - Release Date: 05/04/2005

Thread
Duplicated recordsScott Hamm5 Apr
  • Re: Duplicated recordsJeremy Cole5 Apr
  • Re: Duplicated recordsRhino5 Apr
    • Re: Duplicated recordsScott Hamm5 Apr
      • Re: Duplicated recordsPeter Brawley5 Apr
  • Re: Duplicated recordsRhino5 Apr
    • Re: Duplicated recordsScott Hamm5 Apr
  • Re: Duplicated recordsRhino5 Apr