List:General Discussion« Previous MessageNext Message »
From:Martin Ramsch Date:September 16 1999 5:51pm
Subject:Re: Enumerating rows and cumulating values
View as plain text  
On Wed, 1999-09-15 23:02:36 -0400, James Manning wrote:
> [ Wednesday, September 15, 1999 ] Manuel Coral A. wrote:
[...]

> > mysql> select * from cuenta;
> > +-----------+---------+
> > | name      | value   |
> > +-----------+---------+
> > | other     |  5.0000 |
> > | another   |  2.0000 |
> > | something | 20.0000 |
> > | otros     | 30.0000 |
> > +-----------+---------+
> > 4 rows in set (0.00 sec)
> > 
> > mysql> select sum(t1.name>=t2.name && t1.value>=t2.value) as
> num, t1.*,
> > sum(t2.value * (t1.name>=t2.name && t1.value>=t2.value)) as
> cumulado from
> > cuenta as t1, cuenta as t2 group  by t1.name, t1.value;
> > +------+-----------+---------+----------+
> > | num  | name      | value   | cumulado |
> > +------+-----------+---------+----------+
> > |    1 | another   |  2.0000 |   2.0000 |
> > |    2 | other     |  5.0000 |   7.0000 |
> > |    3 | otros     | 30.0000 |  37.0000 |
> > |    3 | something | 20.0000 |  27.0000 |
> > +------+-----------+---------+----------+
> > 4 rows in set (0.00 sec)
> > 
> > don't work!, [...]
[...]

> AFAICT, Martin's answer very subtlely relies on increasing
> (alphabetically) name having increasing (numerically) value...

Not quite, but you'll see in a second. :)

> Also AFAICT, there's no need to check value with the restriction of
> unique names, as the names give us the necessary ordering...

Yes, for this example data this is true.

The crucial point is, I try to sort the rows by comparing each row
with each others, and counting how often the row in question is
greater or equal to the other rows.

For the smallest row, this is only true when compared with itself,
because it's equal to itself (of course), but never greater then other
rows: comparison count is 1.
  For the next to smallest row, it's equal to itself and it's greater
than the smallest row: comparison count is 2.
  The third row is equal to itself and greater than the two smallest
rows, so: comparison count is 3.
  And so on ...

Thus counting these comparisons gives me a sorting criterion!

But how to compare rows, so I can say this is greater than that?

What one needs to compare is the PRIMARY KEY!
(Or in other words, an as-small-as-possible combination of fields that
is enough to identify each row.)

If names are guaranteed to be unique -- as is the case in the example
-- then the field 'name' is a suitable primary key.  And so in the
example it suffices to use only this single field for comparisons.

But if the same name could appear several times (i.e. it's not unique),
then we need a larger primary key:  the combination (name,value) works.


Here comes the ERROR in my query:
For comparing a combination of two field, one has to put the fields in
some order, or significance and then compare lexicographically!
(Here I assign 'name' the bigger significane, and then compare 'value')
If I could compare pairs of values, I'd write:
  (t1.name, t1.value) >= (t2.name, t2.value)
But this is not possible, so the right way to compare is:
  (t1.name > t2.name ) OR ( t1.name=t2.name AND t1.value >= t2.value )

For a 3-tupel  (a1,b1,c1) >= (a2,b2,c2)  this would be
  a1>a2 || ( a1=a2 && ( b1>b2 || ( b1=b2 && c1>=c2 ) ) )


Take this exmaple data:

   SELECT * FROM cuenta;
   +-----------+-------+
   | name      | value |
   +-----------+-------+
   | other     |  5.00 |
   | another   |  2.00 |
   | something | 20.00 |
   | otros     | 30.00 |
   | other     |  7.00 |
   +-----------+-------+

The name 'other' exists twice, so only comparing the name won't work
here.  But this works:

 SELECT
   SUM(t1.name>t2.name || t1.name=t2.name && t1.value>=t2.value) AS num
 , t1.*
 , SUM(t2.value * (t1.name>t2.name || t1.name=t2.name &&
t1.value>=t2.value)) AS cumulado
 FROM cuenta AS t1
    , cuenta AS t2
 GROUP BY t1.name
        , t1.value;
 +-----+-----------+-------+----------+
 | num | name      | value | cumulado |
 +-----+-----------+-------+----------+
 |   1 | another   |  2.00 |     2.00 |
 |   2 | other     |  5.00 |     7.00 |
 |   3 | other     |  7.00 |    14.00 |
 |   4 | otros     | 30.00 |    44.00 |
 |   5 | something | 20.00 |    64.00 |
 +-----+-----------+-------+----------+


But if there were complete duplicates in the table 'cuenta', that is
identical rows, then there doesn't exist a primary key, and this
method of counting comparisons can't work.  That is what I meant
writing "no duplicates".


I hope I'v eliminated all clarities now? :-)

Regards,
  Martin
-- 
Martin Ramsch <m.ramsch@stripped> <URL: http://home.pages.de/~ramsch/ >
PGP KeyID=0xE8EF4F75 FiPr=52 44 5E F3 B0 B1 38 26  E4 EC 80 58 7B 31 3A D7
Thread
Smybolic Links as dataRobert Canary23 Aug
  • Re: Smybolic Links as dataJim Faucette23 Aug
  • Re: Smybolic Links as dataThimble Smith23 Aug
  • Re: Smybolic Links as dataRobert Canary23 Aug
    • Re: Smybolic Links as dataThimble Smith23 Aug
  • Enumerating rows and cumulating valuesDino25 Aug
    • Enumerating rows and cumulating valuessinisa25 Aug
    • Re: Enumerating rows and cumulating valuesMartin Ramsch13 Sep
      • Re: Enumerating rows and cumulating valuesManuel Coral A.16 Sep
        • Re: Enumerating rows and cumulating valuesJames Manning16 Sep
          • Re: Enumerating rows and cumulating valuesMartin Ramsch16 Sep
        • Re: Enumerating rows and cumulating valuesMartin Ramsch16 Sep