List:Internals« Previous MessageNext Message »
From:Michael Widenius Date:September 30 2008 4:00pm
Subject:Re[4]: SELECT DISTINCT with ORDER BY implementation
View as plain text  
Hi!

>>>>> "Andrew" == Andrew Aksyonoff <shodan@stripped> writes:

Andrew> Hello Sergey,
Andrew> Monday, September 15, 2008, 10:41:31 PM, you wrote:
>>> in MySQL but in general case it can't assume any order and will have
>>> to re-sort the sub-select result by outer GROUP BY instead of inner
>>> ORDER BY. If that sorting is stable, this should work, but can we rely
SP> Yes. This is documented behavior:
SP> http://dev.mysql.com/doc/refman/5.0/en/select.html :
SP> "If you use GROUP BY, output rows are sorted according to the GROUP BY
SP> columns as if you had an ORDER BY for the same columns. To avoid the
SP> overhead of sorting that GROUP BY produces, add ORDER BY NULL:

Andrew> Well, this snippet documents how the *grouped* rows will be ordered,
Andrew> but the question is about the properties of specific sorting algorithm
Andrew> that is internally used to implement GROUP BY.

Andrew> I'm not sure if I'm clear enough so let me provide an example. Assume
Andrew> that the inner SELECT produces the following:

Andrew> id=1, sortkey=123, groupkey=33
Andrew> id=2, sortkey=124, groupkey=33
Andrew> id=3, sortkey=125, groupkey=11
Andrew> id=4, sortkey=126, groupkey=11
Andrew> id=5, sortkey=127, groupkey=22
Andrew> id=6, sortkey=128, groupkey=22

Andrew> I suppose that 'GROUP BY groupkey' will have to sort the incoming
Andrew> rows by groupkey, then go over it sequentially, keeping only the
Andrew> first encountered row for every given groupkey. 

That is one algoritm, but MySQL has others.

Andrew> However if the specific sorting algorithm is not stable it *might*
Andrew> change the order and produce something like that for temporary
Andrew> sorted set:

It's not stable; MySQL is using several different technics to
calculate GROUP BY and may thus return the rows in any order within
the group by.

Andrew> id=4, sortkey=126, groupkey=11
Andrew> id=3, sortkey=125, groupkey=11
Andrew> id=6, sortkey=128, groupkey=22
Andrew> id=5, sortkey=127, groupkey=22
Andrew> id=1, sortkey=123, groupkey=33
Andrew> id=2, sortkey=124, groupkey=33

Andrew> And put id=4 instead of id=3 into the result set.

Andrew> So the question is a bit more subtle :) It's whether the algorithm
Andrew> that GROUP BY (and possibly everything else) uses stable or not.
Andrew> I'd bet a quarter that it is but just want to make sure :)

In general with SQL:  Don't assume any order of rows if you don't
explicitely specify a sort order.

You can send the quarter to 'the well being of dolphin fund'.

Regards,
Monty
Thread
RE: SELECT DISTINCT with ORDER BY implementationMariella Petrini11 Sep
RE: SELECT DISTINCT with ORDER BY implementationMariella Petrini11 Sep