List:Commits« Previous MessageNext Message »
From:Øystein Grøvlen Date:November 19 2010 11:59am
Subject:Re: bzr commit into mysql-next-mr-bugfixing branch (tor.didriksen:3236)
WL#1393
View as plain text  
Hi Tor,

Thanks for the patch.
I am sorry that I have used a bit long time on this review.

Most of my comments are in-line.  In addition, I have some question on 
testing:

   - Will the existence of indexes matter for this feature?  Should
     that be tested in some way?
   - All your tests seem to be on data that is partially sorted. Is
     that an issue?
   - Would it be an idea to have a unit test that is closer to the
     actual usage in the MySQL code?  E.g., operates on byte arrays,
     uses the same comparison function.  That could test some of the
     concepts even if the actual code is not tested


On 11/11/10 01:43 PM, Tor Didriksen wrote:
 > #At file:///export/home/didrik/repo/next-mr-opt-team-wl1393-merge/ 
based on revid:tor.didriksen@stripped
 >
 >   3236 Tor Didriksen	2010-10-21
 >        WL#1393 Optimizing filesort with small limit
 >
 >        Many web customers have to do
 >        "SELECT ... ORDER BY non_index_column LIMIT X",
 >
 >        When X *<Row Size>  is smaller than sort_buff_size we can use
 >        the following algoritm to speed up the sort:
 >
 >        - Create a queue to hold 'limit' keys.
 >        - Scan through the table and store the first (last if DESC) 
keys in the queue
 >        - Return values from queue
 >       @ libmysqld/CMakeLists.txt
 >          Add Bounded_queue
 >       @ libmysqld/Makefile.am
 >          Add Bounded_queue
 >       @ mysql-test/include/order_by.inc
 >          Add new tests.
 >       @ mysql-test/include/select.inc
 >          Fix typo in bug number.
 >       @ mysql-test/r/order_by_none.result
 >          Added new tests.
 >       @ mysql-test/r/order_by_sortkey.result
 >          New test.
 >       @ mysql-test/r/select_none.result
 >          Fix typo in bug number.
 >       @ mysql-test/t/order_by_sortkey.test
 >          New test.
 >       @ mysys/CMakeLists.txt
 >          Compile standalone queues test executable.
 >       @ mysys/queues.c
 >          Fix un-maintained code: s/bool/my_bool/
 >       @ sql/CMakeLists.txt
 >          Add Bounded_queue to gunit tests.
 >       @ sql/Makefile.am
 >          Add Bounded_queue
 >       @ sql/bounded_queue.cc
 >          Since Bounded_queue is templatized, the entire 
implementation must go in the .h file
 >          The free funciton get_merge_many_buffs_cost_fast is here so 
that we can unit test it.

Since the entire implementation of Bonded_queue is in the header file,
I see no reason to call the above file bounded_queue.cc.  I suggest
rather let the name reflect what it contains.  (Another alternative is
to move the entire content to filesort.h. That file does not seem to
contain stuff that complicates a unit test.)

NB! Typo: funciton

 >       @ sql/bounded_queue.h
 >          This is a wrapper on top of QUEUE and the queue_xxx() functions.
 >       @ sql/filesort.cc
 >          Add support for Priority Queue sort for queries with a LIMIT.
 >          Remove dead code.

I think you should be a bit more specific here.  For the PQ things,
one can probably get the necessary info from the worklog, but for
other changes a bit more detail in the commit comments would make it
easier to understand why the changes were done (e.g. why the code is
dead).

 >       @ sql/filesort.h
 >          New argument to filesort()
 >       @ sql/sql_const.h
 >          Use double, rather than int, literals (avoids lots of casting).
 >       @ sql/sql_delete.cc
 >          New argument to filesort()
 >       @ sql/sql_select.cc
 >          Handle new filesort() using priority queue:
 >          - Add some more DBUG_PRINT statements before calls to 
create_sort_index()
 >          - create_sort_index() can use the new PQ algorithm if 
(!group&&  tables==1)
 >          - use information from filesort() if we have OPTION_FOUND_ROWS
 >          - in end_send(): stop if we are about to read more rows than 
allocated by the PQ
 >          - pass OPTION_FOUND_ROWS on to filesort() for single-table 
queries.
 >       @ sql/sql_sort.h
 >          Rename st_sort_param/SORTPARAM to Sort_param.
 >          Add constructor.
 >          Add init() function.
 >       @ sql/sql_table.cc
 >          New argument to filesort()
 >       @ sql/sql_update.cc
 >          New argument to filesort()
 >       @ sql/uniques.cc
 >          Rename SORTPARAM to Sort_param.
 >       @ unittest/gunit/CMakeLists.txt
 >          Add gunit test for Bounded_queue
 >       @ unittest/gunit/bounded_queue-t.cc
 >          Add new unit tests for Bounded_queue and merge-cost estimation.
 >          Add performance tests for Bounded_queue.
 >
 >      added:
 >        mysql-test/r/order_by_sortkey.result
 >        mysql-test/t/order_by_sortkey.test
 >        sql/bounded_queue.cc
 >        sql/bounded_queue.h
 >        unittest/gunit/bounded_queue-t.cc
 >      modified:
 >        libmysqld/CMakeLists.txt
 >        libmysqld/Makefile.am
 >        mysql-test/include/order_by.inc
 >        mysql-test/include/select.inc
 >        mysql-test/r/order_by_none.result
 >        mysql-test/r/select_none.result
 >        mysys/CMakeLists.txt
 >        mysys/queues.c
 >        sql/CMakeLists.txt
 >        sql/Makefile.am
 >        sql/filesort.cc
 >        sql/filesort.h
 >        sql/sql_const.h
 >        sql/sql_delete.cc
 >        sql/sql_select.cc
 >        sql/sql_sort.h
 >        sql/sql_table.cc
 >        sql/sql_update.cc
 >        sql/uniques.cc
 >        unittest/gunit/CMakeLists.txt

...

 > === modified file 'mysql-test/include/order_by.inc'
 > --- a/mysql-test/include/order_by.inc	2010-09-27 13:20:24 +0000
 > +++ b/mysql-test/include/order_by.inc	2010-10-21 14:12:43 +0000
 > @@ -1364,6 +1364,162 @@ ORDER BY t2.c LIMIT 1;
 >
 >   DROP TABLE t1,t2,t3;
 >
 > +--echo #
 > +--echo # WL#1393 - Optimizing filesort with small limit
 > +--echo #
 > +
 > +CREATE TABLE t1(f0 int auto_increment primary key, f1 int, f2 
varchar(200));
 > +INSERT INTO t1(f1, f2) VALUES
 > +(0,"0"),(1,"1"),(2,"2"),(3,"3"),(4,"4"),(5,"5"),
 > +(6,"6"),(7,"7"),(8,"8"),(9,"9"),(10,"10"),
 > +(11,"11"),(12,"12"),(13,"13"),(14,"14"),(15,"15"),
 > +(16,"16"),(17,"17"),(18,"18"),(19,"19"),(20,"20"),
 > +(21,"21"),(22,"22"),(23,"23"),(24,"24"),(25,"25"),
 > +(26,"26"),(27,"27"),(28,"28"),(29,"29"),(30,"30"),
 > +(31,"31"),(32,"32"),(33,"33"),(34,"34"),(35,"35"),
 > +(36,"36"),(37,"37"),(38,"38"),(39,"39"),(40,"40"),
 > +(41,"41"),(42,"42"),(43,"43"),(44,"44"),(45,"45"),
 > +(46,"46"),(47,"47"),(48,"48"),(49,"49"),(50,"50"),
 > +(51,"51"),(52,"52"),(53,"53"),(54,"54"),(55,"55"),
 > +(56,"56"),(57,"57"),(58,"58"),(59,"59"),(60,"60"),
 > +(61,"61"),(62,"62"),(63,"63"),(64,"64"),(65,"65"),
 > +(66,"66"),(67,"67"),(68,"68"),(69,"69"),(70,"70"),
 > +(71,"71"),(72,"72"),(73,"73"),(74,"74"),(75,"75"),
 > +(76,"76"),(77,"77"),(78,"78"),(79,"79"),(80,"80"),
 > +(81,"81"),(82,"82"),(83,"83"),(84,"84"),(85,"85"),
 > +(86,"86"),(87,"87"),(88,"88"),(89,"89"),(90,"90"),
 > +(91,"91"),(92,"92"),(93,"93"),(94,"94"),(95,"95"),
 > +(96,"96"),(97,"97"),(98,"98"),(99,"99");
 > +
 > +################
 > +## Test sort when source data fits to memory

"fits in memory"?

 > +
 > +SELECT * FROM t1 ORDER BY f1 ASC, f0 LIMIT 100;
 > +SELECT * FROM t1 ORDER BY f1 ASC, f0 LIMIT 30;
 > +SELECT * FROM t1 ORDER BY f1 ASC, f0 LIMIT 0;
 > +SELECT * FROM t1 ORDER BY f2 DESC, f0 LIMIT 30;
 > +SELECT * FROM t1 ORDER BY f2 DESC, f0 LIMIT 0;
 > +SELECT * FROM t1 WHERE f1>10 ORDER BY f2, f0 LIMIT 20;
 > +SELECT * FROM t1 WHERE f1>10 ORDER BY f2, f0 LIMIT 0;
 > +SELECT * FROM t1 WHERE f1>10 ORDER BY f2, f0 LIMIT 10 OFFSET 10;
 > +SELECT * FROM t1 WHERE f1>10 ORDER BY f2, f0 LIMIT 0 OFFSET 10;

I assume the reason there are no EXPLAIN queries is that the explain
output does not reflect whether a PQ is used.

 > +
 > +################
 > +## Test sort when source data does not fit to memory
 > +set sort_buffer_size= 32768;
 > +CREATE TEMPORARY TABLE tmp (f1 int, f2 varchar(20));
 > +INSERT INTO tmp SELECT f1, f2 FROM t1;
 > +INSERT INTO t1(f1, f2) SELECT * FROM tmp;
 > +INSERT INTO tmp SELECT f1, f2 FROM t1;
 > +INSERT INTO t1(f1, f2) SELECT * FROM tmp;
 > +
 > +SELECT * FROM t1 ORDER BY f1 ASC, f0 LIMIT 30;
 > +SELECT * FROM t1 ORDER BY f1 ASC, f0 LIMIT 0;
 > +SELECT * FROM t1 ORDER BY f2 DESC, f0 LIMIT 30;
 > +SELECT * FROM t1 ORDER BY f2 DESC, f0 LIMIT 0;
 > +SELECT * FROM t1 WHERE f1>10 ORDER BY f2, f0 LIMIT 20;
 > +SELECT * FROM t1 WHERE f1>10 ORDER BY f2, f0 LIMIT 0;
 > +SELECT * FROM t1 WHERE f1>10 ORDER BY f2, f0 LIMIT 10 OFFSET 10;
 > +SELECT * FROM t1 WHERE f1>10 ORDER BY f2, f0 LIMIT 0 OFFSET 10;
 > +
 > +################
 > +## Test with SQL_CALC_FOUND_ROWS

Is it intentional that these tests are run with a small sort buffer?

 > +SELECT SQL_CALC_FOUND_ROWS * FROM t1
 > +ORDER BY f1, f0 LIMIT 30;
 > +SELECT FOUND_ROWS();
 > +
 > +SELECT SQL_CALC_FOUND_ROWS * FROM t1
 > +ORDER BY f1, f0 LIMIT 0;
 > +SELECT FOUND_ROWS();
 > +
 > +SELECT SQL_CALC_FOUND_ROWS * FROM t1 WHERE f1>10
 > +ORDER BY f2, f0 LIMIT 20;
 > +SELECT FOUND_ROWS();
 > +
 > +SELECT SQL_CALC_FOUND_ROWS * FROM t1 WHERE f1>10
 > +ORDER BY f2, f0 LIMIT 0;
 > +SELECT FOUND_ROWS();
 > +
 > +SELECT SQL_CALC_FOUND_ROWS * FROM t1 WHERE f1>10
 > +ORDER BY f2, f0 LIMIT 10 OFFSET 10;
 > +SELECT FOUND_ROWS();
 > +
 > +SELECT SQL_CALC_FOUND_ROWS * FROM t1 WHERE f1>10
 > +ORDER BY f2, f0 LIMIT 0 OFFSET 10;
 > +SELECT FOUND_ROWS();
 > +

...

 > +################
 > +## Test with subqueries
 > +SELECT d1.* FROM t1
 > +LEFT JOIN (SELECT * FROM t1 LIMIT 30) d1 on t1.f1=d1.f1
 > +ORDER BY d1.f2 DESC LIMIT 30;

I do not think the result of the above query is well defined since the
contents of the derived table, d1, is not explicitly ordered.

 > +
 > +SELECT * FROM t1 WHERE f1 = (SELECT f1 FROM t1 ORDER BY 1 LIMIT 1);
 > +
 > +--error ER_SUBQUERY_NO_1_ROW
 > +SELECT * FROM t1 WHERE f1 = (SELECT f1 FROM t1 ORDER BY 1 LIMIT 2);
 > +
 > +DROP TABLE t1, tmp;
 > +DROP VIEW v1, v2;
 > +
 > +--echo # end of WL#1393 - Optimizing filesort with small limit
 >

...

 > === added file 'mysql-test/t/order_by_sortkey.test'

I do not quite understand the reasoning behind the name here.  Is the
idea that it tests the case where only the sort keys, and not addon
fields, fits in memory?  If you cannot come up with a better name, at
least add an explanation at the start of the file.

 > --- a/mysql-test/t/order_by_sortkey.test	1970-01-01 00:00:00 +0000
 > +++ b/mysql-test/t/order_by_sortkey.test	2010-10-21 14:12:43 +0000
 > @@ -0,0 +1,64 @@
 > +#
 > +# WL#1393 - ORDER BY with LIMIT tests
 > +#
 > +# A big test in a separate file, so that we can run the original
 > +# order_by test with --debug without timeout.

Is this test so big that it should be defined as a big test?

 > +#
 > +CREATE TABLE t1(
 > +  f0 int auto_increment PRIMARY KEY,
 > +  f1 int,
 > +  f2 varchar(200)
 > +) ENGINE=MyISAM;
 > +
 > +INSERT INTO t1(f1, f2) VALUES
 > +(0,"0"),(1,"1"),(2,"2"),(3,"3"),(4,"4"),(5,"5"),
 > +(6,"6"),(7,"7"),(8,"8"),(9,"9"),(10,"10"),
 > +(11,"11"),(12,"12"),(13,"13"),(14,"14"),(15,"15"),
 > +(16,"16"),(17,"17"),(18,"18"),(19,"19"),(20,"20"),
 > +(21,"21"),(22,"22"),(23,"23"),(24,"24"),(25,"25"),
 > +(26,"26"),(27,"27"),(28,"28"),(29,"29"),(30,"30"),
 > +(31,"31"),(32,"32"),(33,"33"),(34,"34"),(35,"35"),
 > +(36,"36"),(37,"37"),(38,"38"),(39,"39"),(40,"40"),
 > +(41,"41"),(42,"42"),(43,"43"),(44,"44"),(45,"45"),
 > +(46,"46"),(47,"47"),(48,"48"),(49,"49"),(50,"50"),
 > +(51,"51"),(52,"52"),(53,"53"),(54,"54"),(55,"55"),
 > +(56,"56"),(57,"57"),(58,"58"),(59,"59"),(60,"60"),
 > +(61,"61"),(62,"62"),(63,"63"),(64,"64"),(65,"65"),
 > +(66,"66"),(67,"67"),(68,"68"),(69,"69"),(70,"70"),
 > +(71,"71"),(72,"72"),(73,"73"),(74,"74"),(75,"75"),
 > +(76,"76"),(77,"77"),(78,"78"),(79,"79"),(80,"80"),
 > +(81,"81"),(82,"82"),(83,"83"),(84,"84"),(85,"85"),
 > +(86,"86"),(87,"87"),(88,"88"),(89,"89"),(90,"90"),
 > +(91,"91"),(92,"92"),(93,"93"),(94,"94"),(95,"95"),
 > +(96,"96"),(97,"97"),(98,"98"),(99,"99");
 > +
 > +CREATE TEMPORARY TABLE tmp (f1 int, f2 varchar(20));
 > +INSERT INTO tmp SELECT f1,f2 FROM t1;
 > +INSERT INTO t1(f1,f2) SELECT * FROM tmp;
 > +INSERT INTO tmp SELECT f1,f2 FROM t1;
 > +INSERT INTO t1(f1,f2) SELECT * FROM tmp;
 > +
 > +# Test when only sortkeys fits to memory
 > +set sort_buffer_size= 32768;

Why is sort_buffer_size set in the middle of the insert statements?

 > +INSERT INTO t1(f1,f2) SELECT * FROM tmp;
 > +INSERT INTO tmp SELECT f1,f2 FROM t1;
 > +INSERT INTO t1(f1,f2) SELECT * FROM tmp;
 > +INSERT INTO tmp SELECT f1,f2 FROM t1;
 > +INSERT INTO t1(f1,f2) SELECT * FROM tmp;
 > +INSERT INTO tmp SELECT f1,f2 FROM t1;
 > +INSERT INTO t1(f1,f2) SELECT * FROM tmp;
 > +INSERT INTO tmp SELECT f1,f2 FROM t1;
 > +INSERT INTO t1(f1,f2) SELECT * FROM tmp;
 > +INSERT INTO tmp SELECT f1,f2 FROM t1;
 > +INSERT INTO t1(f1,f2) SELECT * FROM tmp;
 > +INSERT INTO tmp SELECT f1,f2 FROM t1;
 > +INSERT INTO t1(f1,f2) SELECT * FROM tmp;
 > +
 > +FLUSH STATUS;
 > +SHOW SESSION STATUS LIKE 'Sort%';
 > +
 > +SELECT * FROM t1 ORDER BY f2 LIMIT 100;

I notice that if I run this query without your new feature, I get a
different ordering of rows, but maybe this is unavoidable.  In order
to test what you are actually testing, you will need to select columns
that are not ordered?

 > +
 > +SHOW SESSION STATUS LIKE 'Sort%';
 > +
 > +DROP TABLE t1, tmp;
 >

...

 > === added file 'sql/bounded_queue.cc'
 > --- a/sql/bounded_queue.cc	1970-01-01 00:00:00 +0000
 > +++ b/sql/bounded_queue.cc	2010-10-21 14:12:43 +0000
 > @@ -0,0 +1,67 @@
 > +/* Copyright (c) 2010, Oracle and/or its affiliates. All rights 
reserved.
 > +
 > +   This program is free software; you can redistribute it and/or modify
 > +   it under the terms of the GNU General Public License as published by
 > +   the Free Software Foundation; version 2 of the License.
 > +
 > +   This program is distributed in the hope that it will be useful,
 > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
 > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 > +   GNU General Public License for more details.
 > +
 > +   You should have received a copy of the GNU General Public License
 > +   along with this program; if not, write to the Free Software
 > +   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307  USA */
 > +
 > +#include "bounded_queue.h"

Why include bounded_queue.h? It does not seem to be needed.

 > +#include "sql_const.h"
 > +#include "sql_sort.h"
 > +
 > +

I miss documentation for this function, and an examplanation of the
theory behind the calculations.

 > +double get_merge_many_buffs_cost_fast(ha_rows num_buffers, ha_rows 
max_n_elems,
 > +                                      ha_rows last_n_elems, uint 
elem_size)
 > +{

There is another function get_merge_many_buffs_cost, that seems to
compute something that is not quite the same.  I think you should use
another name here.  This function seems to include not just the cost
of merging buffers, but also the cost of sorting.

 > +  double total_cost;
 > +  const double MERGEBUFF_double= (double) MERGEBUFF;
 > +
 > +  /* Calc CPU cost of sorting each buffer */

Each buffer?  Seems more like the cost of sorting all buffers.

 > +  total_cost= ( num_buffers * max_n_elems * log(1.0 + max_n_elems) +
 > +                last_n_elems * log(1.0 + last_n_elems) )
 > +    / TIME_FOR_COMPARE_ROWID;

Why do we add 1.0 in the formula above?  If max_n_elems is 1, I would
assume the cost to sort was 0.

 > +
 > +  /* Simulate behavior of merge_many_buff(). */
 > +  while (num_buffers>= MERGEBUFF2)
 > +  {
 > +    /* Calculate # of calls to merge_buffers() */
 > +    ha_rows loop_limit= num_buffers - MERGEBUFF*3/2;

loop_limit is a bit too general to me.  Would it be possible to come
up with a name that better describes what it represents?

 > +    ha_rows num_merge_calls= 1 + loop_limit/MERGEBUFF;
 > +    ha_rows i_end_value= num_merge_calls * MERGEBUFF;
 > +
 > +    /* Cost of merge sort 'num_merge_calls' */
 > +    total_cost+= num_merge_calls *
 > +      ( 2.0 * (max_n_elems * MERGEBUFF_double * elem_size) / IO_SIZE
 > +        + max_n_elems * MERGEBUFF_double * log(MERGEBUFF_double) /
 > +          (TIME_FOR_COMPARE_ROWID * M_LN2));
 > +
 > +    /* # of records in last buffer */
 > +    last_n_elems+= (num_buffers - i_end_value) * max_n_elems;
 > +
 > +    /* cost of merge sort of last buffer */
 > +    total_cost+= 2.0 * ((double)last_n_elems * elem_size) / IO_SIZE
 > +      + double(last_n_elems) * log(MERGEBUFF_double) /
 > +        (TIME_FOR_COMPARE_ROWID * M_LN2);
 > +
 > +    num_buffers= num_merge_calls;
 > +    max_n_elems*= MERGEBUFF;
 > +  }
 > +

It seems to me that it should be possible to avoid this loop, but it
will probably require some complex expressions involving logarithms to
get the exact same estimate. But how precise do we really need to
be here?  In the end, we are going to compare this to a number that
are based on pretty coarse heuristics (PQ_slowness).

Will it not be possible to come up with a good-enough estimate that
involves a bit less computation?

 > +  /* Simulate final merge_buff call. */
 > +  last_n_elems+= max_n_elems*num_buffers;
 > +  total_cost+= 2.0 * ((double)last_n_elems * elem_size) / IO_SIZE
 > +    + double(last_n_elems) * log(num_buffers + 1.0) /
 > +      (TIME_FOR_COMPARE_ROWID * M_LN2);

The same formula seems to be used several times here.  I think it
should be factored into a separate function.

 > +
 > +  return total_cost;
 > +}
 > +
 > +
 >
 > === added file 'sql/bounded_queue.h'
 > --- a/sql/bounded_queue.h	1970-01-01 00:00:00 +0000
 > +++ b/sql/bounded_queue.h	2010-10-21 14:12:43 +0000
 > @@ -0,0 +1,189 @@
 > +/* Copyright (c) 2010, Oracle and/or its affiliates. All rights 
reserved.
 > +
 > +   This program is free software; you can redistribute it and/or modify
 > +   it under the terms of the GNU General Public License as published by
 > +   the Free Software Foundation; version 2 of the License.
 > +
 > +   This program is distributed in the hope that it will be useful,
 > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
 > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 > +   GNU General Public License for more details.
 > +
 > +   You should have received a copy of the GNU General Public License
 > +   along with this program; if not, write to the Free Software
 > +   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307  USA */
 > +
 > +#ifndef BOUNDED_QUEUE_INCLUDED
 > +#define BOUNDED_QUEUE_INCLUDED
 > +
 > +#include<string.h>
 > +#include "my_global.h"
 > +#include "my_base.h"
 > +#include "my_sys.h"
 > +#include "queues.h"
 > +
 > +class Sort_param;
 > +
 > +/**
 > +  A priority queue with a fixed, limited size.
 > +
 > +  This is a wrapper on top of QUEUE and the queue_xxx() functions.
 > +  It keeps the top N elements which are inserted.

I think it should be explained a bit what is meant by top N here.
I also think it should be described what Element_type and Key_type
represents, and how they relate.

 > + */
 > +template<typename Element_type, typename Key_type>
 > +class Bounded_queue
 > +{
 > +public:
 > +  Bounded_queue()
 > +  {
 > +    memset(&m_queue, 0, sizeof(m_queue));
 > +  }
 > +
 > +  ~Bounded_queue()
 > +  {
 > +    delete_queue(&m_queue);
 > +  }
 > +
 > +  /**
 > +     Function for making sort-key from input data.
 > +     @param param Sort parameters.
 > +     @param to    Where to put the key.
 > +     @param from  The input data.
 > +  */
 > +  typedef void (*keymaker_function)(Sort_param *param,
 > +                                    Key_type *to,
 > +                                    Element_type *from);
 > +
 > +  /**
 > +     Function for comparing two keys.
 > +     @param  n Pointer to number of bytes to compare.
 > +     @param  a First key.
 > +     @parm   b Second key.
 > +     @retval a<=>  b.

Is it well known what <=> is supposed to mean?  At least, this is a
different meaning from the MySQL <=> operator.  Hence, I think this
should be more explicit.

 > +   */
 > +  typedef int (*compare_function)(size_t *v, Key_type **a, Key_type 
**b);
 > +
 > +  /**
 > +    Initialize the queue.
 > +
 > +    @param max_elements   The size of the queue.
 > +    @param offset_to_key  Offset to key in elements stored in the queue.

When is this needed?  So far, it does not seem to be used.  Do we
really need to expose this?  If yes, I think a better explanations is
needed, and the use of it should also be included in a unit test.

 > +    @param max_at_top     Set to 1 if you want biggest element on top.

Why is this parameter of type pbool?  This is as far as I can tell,
the first use of pbool in the entire sql directory.  Why not use an
ordinary bool here?  What the underlying implementation use, should
not be an argument for what is exposed in the API.

 > +    @param compare        Compare function for elements, takes 3 
arguments.
 > +                          If NULL, we use 
get_ptr_compare(compare_length).
 > +    @parem compare_length Lenght of the data used for sorting.

It is a bit unclear to me what "length of the data" really means.  Are
we talking about the length of the keys to be compared, the length of
each record to be sorted, the length of the entire data set?

Typo: @parem
Typo: Lenght


 > +    @param keymaker       Function which generates keys for elements.
 > +    @param sort_param     Sort parameters.
 > +    @param sort_keys      Array of pointers to keys to sort.
 > +
 > +    @retval 0 OK, 1 Could not allocate memory.
 > +
 > +    We do *not* take ownership of any of the input pointer arguments.
 > +   */
 > +  int init(ha_rows max_elements, uint offset_to_key, pbool max_at_top,
 > +           compare_function compare, size_t compare_length,
 > +           keymaker_function keymaker, Sort_param *sort_param,
 > +           Key_type **sort_keys);
 > +
 > +  /**
 > +    Pushes an element on the queue.
 > +    If the queue is already full, we discard the smallest (or
 > biggest) element.

Maybe you should rather define "top-N" and stick with that.  "smallest
(or biggest)" only spreads FUD ;-)

 > +
 > +    @param element        The element to be pushed.
 > +   */
 > +  void push(Element_type *element);
 > +
 > +  /**
 > +    Removes an element from the queue.

Any element? Maybe it should say top element or something.

 > +
 > +    @retval Pointer to the removed element.

But what is returned is Key_type, not Element_Type.

 > +   */
 > +  Key_type **pop()
 > +  {
 > +    return (Key_type**) queue_remove(&m_queue, 0);
 > +  }
 > +
 > +  /**
 > +    The number of elements in the queue.
 > +   */
 > +  uint num_elements() const { return m_queue.elements; }
 > +
 > +  /**
 > +    Is the queue initialized?
 > +   */
 > +  bool is_initialized() const { return m_queue.max_elements>  0; }
 > +
 > +private:
 > +  Key_type         **m_sort_keys;
 > +  size_t             m_compare_length;
 > +  keymaker_function  m_keymaker;
 > +  Sort_param        *m_sort_param;
 > +  st_queue           m_queue;
 > +};
 > +
 > +
 > +template<typename Element_type, typename Key_type>
 > +int Bounded_queue<Element_type, Key_type>::init(ha_rows max_elements,
 > +                                                uint offset_to_key,
 > +                                                pbool max_at_top,
 > +                                                compare_function 
compare,
 > +                                                size_t compare_length,
 > +                                                keymaker_function 
keymaker,
 > +                                                Sort_param *sort_param,
 > +                                                Key_type **sort_keys)
 > +{
 > +  DBUG_ASSERT(sort_keys != NULL);
 > +
 > +  m_sort_keys=      sort_keys;
 > +  m_compare_length= compare_length;
 > +  m_keymaker=       keymaker;
 > +  m_sort_param=     sort_param;
 > +  // init_queue() takes an uint, and also does (max_elements + 1)
 > +  if (max_elements>= (UINT_MAX - 1))
 > +    return 1;
 > +  if (compare == NULL)
 > +    compare= 
reinterpret_cast<compare_function>(get_ptr_compare(compare_length));
 > +  // We allocate space for one extra element, for replace when queue 
is full.
 > +  return init_queue(&m_queue, (uint) max_elements + 1, 
offset_to_key, max_at_top,
 > + 
reinterpret_cast<queue_compare>(compare),&m_compare_length);
 > +}

I do not quite understand how using a default compare function with
a different number of arguments may work.  According to the
documentation for st_queue, the compare function should take 3
arguments.

 > +
 > +
 > +template<typename Element_type, typename Key_type>
 > +void Bounded_queue<Element_type, Key_type>::push(Element_type *element)
 > +{
 > +  DBUG_ASSERT(is_initialized());
 > +  if (queue_is_full((&m_queue)))
 > +  {
 > +    Key_type **pq_top= reinterpret_cast<Key_type 
**>(queue_top(&m_queue));
 > +    (*m_keymaker)(m_sort_param, *pq_top, element);
 > +    queue_replaced(&m_queue);

Should not the element be discarded if the key is greater than all
elements currently present?

 > +  } else {
 > +    (*m_keymaker)(m_sort_param, m_sort_keys[m_queue.elements], element);
 > +    queue_insert(&m_queue,
 > + 
reinterpret_cast<uchar*>(&m_sort_keys[m_queue.elements]));
 > +  }
 > +}
 > +
 > +
 > +/*
 > +  Calculate cost of merge sort
 > +
 > +    @param num_buffers    # of merge buffers
 > +    @param max_n_elems    max # of keys in memory buffer
 > +    @param last_n_elems   # of keys in last buffer
 > +    @param elem_size      Size of each element.
 > +
 > +    Calculates cost of merge sort by simulating call to 
merge_many_buff().
 > +
 > +  @retval
 > +    computed cost of merge sort

What is the unit for this cost value?

 > +
 > +  @note
 > +    Declared here in order to be able to unit test it,
 > +    since library dependencies have not been sorted out yet.
 > +*/
 > +double get_merge_many_buffs_cost_fast(ha_rows num_buffers, ha_rows 
max_n_elems,
 > +                                      ha_rows last_n_elems, uint 
elem_size);
 > +
 > +#endif  // BOUNDED_QUEUE_INCLUDED
 >
 > === modified file 'sql/filesort.cc'
 > --- a/sql/filesort.cc	2010-08-04 10:34:01 +0000
 > +++ b/sql/filesort.cc	2010-10-21 14:12:43 +0000
 > @@ -32,11 +32,27 @@
 >   #include "probes_mysql.h"
 >   #include "sql_test.h"                           // TEST_filesort
 >   #include "opt_range.h"                          // SQL_SELECT
 > +#include "bounded_queue.h"
 > +#include "sql_select.h"
 >
 >   #ifndef THREAD
 >   #define SKIP_DBUG_IN_FILESORT
 >   #endif
 >
 > +/*
 > +  How much Priority Queue sort is slower than qsort.
 > +  For qsort we have an average case complexity of O(n*log(n)) 
comparisons.
 > +  When using a priority queue, we have log(n) comparisons each time we
 > +  push a new element. For PQ we also call qsort at the end.

But this qsort at the end is for a smaller number of elements so I do
not follow the logic here.

 > +  Measurements show that there is some extra overhead in QUEUE,
 > +  so we set it to 3.0 rather than 2.0.

I can accept that 3.0 is a good value, but I do not understand how 2.0
come into the picture in the first place.

 > + */
 > +static double PQ_slowness= 3.0;

Should not this be const?
Why is the variable file scoped?  It is used in only one function.

 > +
 > +#ifdef HAVE_EXPLICIT_TEMPLATE_INSTANTIATION
 > +template class Bounded_queue<uchar, uchar>;
 > +#endif
 > +
 >   /// How to write record_ref.
 >   #define WRITE_REF(file,from) \
 >   if (my_b_write((file),(uchar*) (from),param->ref_length)) \

...


 > @@ -80,18 +132,17 @@ static void unpack_addon_fields(struct s
 >     The result set is stored in table->io_cache or
 >     table->record_pointers.
 >
 > -  @param thd           Current thread
 > -  @param table		Table to sort
 > -  @param sortorder	How to sort the table
 > -  @param s_length	Number of elements in sortorder
 > -  @param select		condition to apply to the rows
 > -  @param max_rows	Return only this many rows
 > -  @param sort_positions	Set to 1 if we want to force sorting by position
 > -			(Needed by UPDATE/INSERT or ALTER TABLE)
 > -  @param examined_rows	Store number of examined rows here
 > +  @param      thd            Current thread
 > +  @param      table          Table to sort
 > +  @param      sortorder      How to sort the table
 > +  @param      s_length       Number of elements in sortorder
 > +  @param      select         Condition to apply to the rows
 > +  @param      max_rows       Return only this many rows
 > +  @param      sort_positions Set to TRUE if we want to force sorting 
by position
 > +                             (Needed by UPDATE/INSERT or ALTER TABLE)
 > +  @param[out] examined_rows  Store number of examined rows here
 > +  @param      options        0 or OPTION_FOUND_ROWS

What is the point in having a generic options parameter when there is
only one option.  Why not just use a bool for this?

 >
 > -  @todo
 > -    check why we do this (param.keys--)
 >     @note
 >       If we sort by position (like if sort_positions is 1) filesort() 
will
 >       call table->prepare_for_position().

...

 > @@ -147,117 +199,98 @@ ha_rows filesort(THD *thd, TABLE *table,
 >     my_b_clear(&buffpek_pointers);
 >     buffpek=0;
 >     error= 1;
 > -  bzero((char*)&param,sizeof(param));
 > -  param.sort_length= sortlength(thd, sortorder, 
s_length,&multi_byte_charset);
 > +
 > +  param.init(sortlength(thd, sortorder, s_length,&multi_byte_charset),
 > +             table, thd, max_rows, sort_positions);
 >     /* filesort cannot handle zero-length records. */
 >     DBUG_ASSERT(param.sort_length);
 > -  param.ref_length= table->file->ref_length;
 > -  param.addon_field= 0;
 > -  param.addon_length= 0;
 > -  if (!(table->file->ha_table_flags()&  HA_FAST_KEY_READ)&&
 > -      !table->fulltext_searched&&  !sort_positions)
 > -  {
 > -    /*
 > -      Get the descriptors of all fields whose values are appended
 > -      to sorted fields and get its total length in param.spack_length.
 > -    */
 > -    param.addon_field= get_addon_fields(thd, table->field,
 > -                                        param.sort_length,
 > -&param.addon_length);
 > -  }
 >
 >     table_sort.addon_buf= 0;
 >     table_sort.addon_length= param.addon_length;
 >     table_sort.addon_field= param.addon_field;
 >     table_sort.unpack= unpack_addon_fields;
 > -  if (param.addon_field)
 > -  {
 > -    param.res_length= param.addon_length;
 > -    if (!(table_sort.addon_buf= (uchar *) my_malloc(param.addon_length,
 > -                                                    MYF(MY_WME))))
 > +  if (param.addon_field&&
 > +      !(table_sort.addon_buf=
 > +        (uchar *) my_malloc(param.addon_length, MYF(MY_WME))))
 >         goto err;

Why has this not been moved to Sort_param::init() like the rest?

...

 > -  }
 > -  else
 > -  {
 > -    param.res_length= param.ref_length;
 > -    /*
 > -      The reference to the record is considered
 > -      as an additional sorted field
 > -    */
 > -    param.sort_length+= param.ref_length;
 > -  }
 > -  param.rec_length= param.sort_length+param.addon_length;
 > -  param.max_rows= max_rows;
 >
 >     if (select&&  select->quick)
 > -  {
 >       status_var_increment(thd->status_var.filesort_range_count);
 > -  }
 >     else
 > -  {
 >       status_var_increment(thd->status_var.filesort_scan_count);
 > -  }
 > -#ifdef CAN_TRUST_RANGE
 > -  if (select&&  select->quick&& 
select->quick->records>  0L)
 > -  {
 > -    records=min((ha_rows) (select->quick->records*2+EXTRA_RECORDS*2),
 > -		table->file->stats.records)+EXTRA_RECORDS;
 > -    selected_records_file=0;
 > -  }
 > -  else
 > -#endif
 > -  {
 > -    records= table->file->estimate_rows_upper_bound();
 > -    /*
 > -      If number of records is not known, use as much of sort buffer
 > -      as possible.
 > -    */
 > -    if (records == HA_POS_ERROR)
 > -      records--;  // we use 'records+1' below.

No need for this anymore?

 > -    selected_records_file= 0;
 > -  }
 > +
 > +  /*
 > +    If number of rows is not known, use as much of sort buffer as 
possible.
 > +  */
 > +  num_rows= table->file->estimate_rows_upper_bound();
 >
 >     if (multi_byte_charset&&
 >         !(param.tmp_buffer= (char*) 
my_malloc(param.sort_length,MYF(MY_WME))))
 >       goto err;
 >
 >     memavl= thd->variables.sortbuff_size;
 > -  min_sort_memory= max(MIN_SORT_MEMORY, param.sort_length*MERGEBUFF2);
 > -  while (memavl>= min_sort_memory)
 > -  {
 > -    ulong old_memavl;
 > -    ulong keys= memavl/(param.rec_length+sizeof(char*));
 > -    param.keys=(uint) min(records+1, keys);
 > -    if ((table_sort.sort_keys=
 > -	 (uchar **) make_char_array((char **) table_sort.sort_keys,
 > -                                    param.keys, param.rec_length, 
MYF(0))))
 > -      break;
 > -    old_memavl=memavl;
 > -    if ((memavl=memavl/4*3)<  min_sort_memory&&  old_memavl> 
min_sort_memory)
 > -      memavl= min_sort_memory;
 > +
 > +  table_sort.sort_keys=
 > +    check_if_pq_applicable(&param,&table_sort, table, num_rows, memavl);
 > +  if (table_sort.sort_keys)
 > +  {
 > +    DBUG_PRINT("info", ("filesort PQ is applicable"));
 > +    const size_t compare_length= param.sort_length;
 > +    if (pq.init(param.max_rows, 0, 1,
 > +                NULL,
 > +                compare_length,
 > +&make_sortkey,&param, table_sort.sort_keys))
 > +    {
 > +      // If failed to init pq, fall back to merge-sort.
 > +      my_free(table_sort.sort_keys);
 > +      table_sort.sort_keys= NULL;

When may this happen?  If it is only when something is terribly wrong
(e.g., out-of-memory), I think a fall-back strategy is just making
things unecessary complex, and impossible to test.

 > +    }
 >     }
 > -  sort_keys= table_sort.sort_keys;
 > -  if (memavl<  min_sort_memory)
 > +  else
 > +    DBUG_PRINT("info", ("filesort PQ is not applicable"));

This will not catch the fall-back to merge-sort case.  Maybe it should
be moved to the if-block below.

 > +
 > +  if (!table_sort.sort_keys)

This could have been made into an else-statement if there was no
fall-back. Would make things simpler.

 >     {
 > -    my_error(ER_OUT_OF_SORTMEMORY,MYF(ME_ERROR+ME_WAITTANG));
 > -    goto err;
 > +    const ulong min_sort_memory=
 > +      max(MIN_SORT_MEMORY, param.sort_length*MERGEBUFF2);
 > +    while (memavl>= min_sort_memory)
 > +    {
 > +      ulong old_memavl;
 > +      ulong keys= memavl/(param.rec_length+sizeof(char*));
 > +      param.num_keys_per_buffer= (uint) min(num_rows, keys);
 > +      if ((table_sort.sort_keys=
 > +           make_char_array(table_sort.sort_keys,
 > +                           param.num_keys_per_buffer, 
param.rec_length)))
 > +        break;
 > +      old_memavl=memavl;
 > +      if ((memavl=memavl/4*3)<  min_sort_memory&&  old_memavl> 
min_sort_memory)
 > +        memavl= min_sort_memory;
 > +    }
 > +    if (memavl<  min_sort_memory)
 > +    {
 > +      my_error(ER_OUT_OF_SORTMEMORY,MYF(ME_ERROR+ME_WAITTANG));
 > +      goto err;
 > +    }
 >     }
 > +
 > +  sort_keys= table_sort.sort_keys;
 >     if (open_cached_file(&buffpek_pointers,mysql_tmpdir,TEMP_PREFIX,
 >   		       DISK_BUFFER_SIZE, MYF(MY_WME)))
 >       goto err;
 >
 > -  param.keys--;  			/* TODO: check why we do this */
 >     param.sort_form= table;
 >     param.end=(param.local_sortorder=sortorder)+s_length;
 > -  if ((records=find_all_keys(&param,select,sort_keys,&buffpek_pointers,
 > -			&tempfile, selected_records_file)) ==
 > +  if ((num_rows= find_all_keys(&param, select, 
sort_keys,&buffpek_pointers,
 > +&tempfile,
 > +                               pq.is_initialized() ?&pq : NULL,

I think things would be a bit more readable if the above line was done
first in a separate statement.

 > +&found_rows)) ==
 >         HA_POS_ERROR)
 >       goto err;
 >     maxbuffer= (uint) (my_b_tell(&buffpek_pointers)/sizeof(*buffpek));
 >
 >     if (maxbuffer == 0)			// The whole set is in memory
 >     {
 > -    if (save_index(&param,sort_keys,(uint) records,&table_sort))
 > +    if (save_index(&param, sort_keys, (uint) num_rows,&table_sort))
 >         goto err;
 >     }
 >     else

...

 > @@ -299,9 +333,17 @@ ha_rows filesort(THD *thd, TABLE *table,
 >   		    outfile))
 >         goto err;
 >     }
 > -  if (records>  param.max_rows)
 > -    records=param.max_rows;
 > -  error =0;
 > +
 > +  if (pq.is_initialized()&&  (options&  OPTION_FOUND_ROWS))
 > +  {
 > +    DBUG_PRINT("info", ("PQ and OPTION_FOUND_ROWS !!"));
 > +  }

I would suggest a more descriptive text.  And why two "!" ?

 > +
 > +  if (options&  OPTION_FOUND_ROWS)
 > +    num_rows= found_rows;
 > +  else if (num_rows>  param.max_rows)
 > +    num_rows= param.max_rows;

I think a comment explaining the significance of the above would be
good.

 > +  error= 0;
 >
 >    err:
 >     my_free(param.tmp_buffer);

...

 > @@ -366,23 +409,22 @@ void filesort_free_buffers(TABLE *table,
 >
 >   /** Make a array of string pointers. */
 >
 > -static char **make_char_array(char **old_pos, register uint fields,
 > -                              uint length, myf my_flag)
 > +static uchar **make_char_array(uchar **old_pos, uint fields, uint 
length)
 >   {
 > -  register char **pos;
 > -  char *char_pos;
 >     DBUG_ENTER("make_char_array");
 >
 >     if (old_pos ||
 > -      (old_pos= (char**) my_malloc((uint) fields*(length+sizeof(char*)),
 > -				   my_flag)))
 > +      (old_pos=
 > +       (uchar**) my_malloc(fields * (length + sizeof(char*)), MYF(0))))

Why sizeof(char*) here?  Would it be more consistent with uchar*?

 >     {
 > -    pos=old_pos; char_pos=((char*) (pos+fields)) -length;
 > -    while (fields--) *(pos++) = (char_pos+= length);
 > +    uchar **pos= old_pos;
 > +    uchar *char_pos= ((uchar*) (pos+fields)) - length;
 > +    while (fields--)
 > +      *(pos++) = (char_pos+= length);
 >     }
 >
 >     DBUG_RETURN(old_pos);
 > -} /* make_char_array */
 > +}
 >
 >
 >   /** Read 'count' number of buffer pointers into memory. */
 > @@ -470,20 +512,26 @@ static void dbug_print_record(TABLE *tab
 >     @param buffpek_pointers  File to write BUFFPEKs describing sorted 
segments
 >                              in tempfile.
 >     @param tempfile          File to write sorted sequences of 
sortkeys to.
 > -  @param indexfile         If !NULL, use it for source data 
(contains rowids)
 > +  @param pq                If !NULL, use it for keeping top N elements
 > +  @param [out] found_rows  The number of FOUND_ROWS().
 >
 >     @note
 >       Basic idea:
 >       @verbatim
 >        while (get_next_sortkey())
 >        {
 > -       if (no free space in sort_keys buffers)
 > +       if (using priority queue)
 > +         push sort key into queue
 > +       else
 >          {
 > -         sort sort_keys buffer;
 > -         dump sorted sequence to 'tempfile';
 > -         dump BUFFPEK describing sequence location into 
'buffpek_pointers';
 > +         if (no free space in sort_keys buffers)
 > +         {
 > +           sort sort_keys buffer;
 > +           dump sorted sequence to 'tempfile';
 > +           dump BUFFPEK describing sequence location into 
'buffpek_pointers';
 > +         }
 > +         put sort key into 'sort_keys';
 >          }
 > -       put sort key into 'sort_keys';
 >        }

Do the first line of the function need updating?  I guess you are not
always writing into a tempfile anymore?

 >        if (sort_keys has some elements&&  dumped at least once)
 >          sort-dump-dump as above;

...

 > @@ -577,16 +628,6 @@ static ha_rows find_all_keys(SORTPARAM *
 >       }
 >       else					/* Not quick-select */
 >       {
 > -      if (indexfile)
 > -      {
 > -	if (my_b_read(indexfile,(uchar*) ref_pos,ref_length)) /* purecov: 
deadcode */
 > -	{
 > -	  error= my_errno ? my_errno : -1;		/* Abort */
 > -	  break;
 > -	}
 > -	error= file->ha_rnd_pos(sort_form->record[0], next_pos);
 > -      }
 > -      else
 >         {

I appreciate that you have tried to reduced the amount of diff for the
reviewer, but it would be good with a follow-up patch to reformat such
things.


 >   	error= file->ha_rnd_next(sort_form->record[0]);
 >   	if (!flag)

...

 > @@ -616,14 +657,23 @@ static ha_rows find_all_keys(SORTPARAM *
 >       if (!error&&  (!select ||
 >                      (!select->skip_record(thd,&skip_record)&& 
!skip_record)))
 >       {
 > -      if (idx == param->keys)
 > +      ++(*found_rows);
 > +      if (pq)
 >         {
 > -	if (write_keys(param,sort_keys,idx,buffpek_pointers,tempfile))
 > -	  DBUG_RETURN(HA_POS_ERROR);
 > -	idx=0;
 > -	indexpos++;
 > +        pq->push(ref_pos);
 > +        idx= pq->num_elements();
 > +      }
 > +      else
 > +      {
 > +        if (idx == param->num_keys_per_buffer)

I think I prefer "else if" but I guess that is a matter of taste.

 > +        {
 > +          if (write_keys(param,sort_keys, idx, buffpek_pointers, 
tempfile))
 > +             DBUG_RETURN(HA_POS_ERROR);
 > +          idx= 0;
 > +          indexpos++;
 > +        }
 > +        make_sortkey(param, sort_keys[idx++], ref_pos);
 >         }
 > -      make_sortkey(param,sort_keys[idx++],ref_pos);
 >       }
 >       else
 >         file->unlock_row();

...

 > @@ -1036,17 +1088,17 @@ static void register_used_fields(SORTPAR
 >   }
 >
 >
 > -static bool save_index(SORTPARAM *param, uchar **sort_keys, uint count,
 > +static bool save_index(Sort_param *param, uchar **sort_keys, uint count,
 >                          FILESORT_INFO *table_sort)
 >   {
 >     uint offset,res_length;
 >     uchar *to;
 >     DBUG_ENTER("save_index");
 >
 > -  my_string_ptr_sort((uchar*) sort_keys, (uint) count, 
param->sort_length);
 > +  my_string_ptr_sort((uchar*) sort_keys, count, param->sort_length);
 >     res_length= param->res_length;
 >     offset= param->rec_length-res_length;
 > -  if ((ha_rows) count>  param->max_rows)
 > +  if ((ha_rows) count>  param->max_rows&&  param->max_rows>  0)

Should param->max_rows ever be <= 0? If not, I would prefer an assert
here.

 >       count=(uint) param->max_rows;
 >     if (!(to= table_sort->record_pointers=
 >           (uchar*) my_malloc(res_length*count, MYF(MY_WME))))
 > @@ -1060,10 +1112,127 @@ static bool save_index(SORTPARAM *param,
 >   }
 >
 >
 > +/**
 > +  Test whether priority queue is worth using to get top elements of an
 > +  ordered result set. If it is then allocates buffer for required 
amount of

I suggest: "If it is, ..."

It is not clear to me, why it is a good idea to do the allocation of
the buffer within this function.  It seems it leads to unnecessary
duplication of calls to make_char_array().


 > +  records
 > +
 > +  @param param     Sort parameters.
 > +  @param info      Filesort information.
 > +  @param table     Table to sort.
 > +  @param num_rows  Estimate of number of rows in source record set.

If this is just an estimate, do you risk that the buffer will be too
small?  What happens then?

 > +  @param memavl    Memory available for sorting.

I think memavl is a bit cryptic.

 > +
 > +  DESCRIPTION

I think you should consider rewriting the description below.  I do not
feel it clearly describes the logic of the function.

 > +    Function tests applicability of PQ to get L top records for 
queries like

"L top records". L should be defined.  The example in the line below
uses max_rows for L it seems.

 > +    SELECT ... FROM t ORDER BY a1,...,an LIMIT max_rows;
 > +    It tests whether there is enough memory to contain max_rows 
elements,
 > +    whether max_rows is small enough to make PQ perform faster that 
qsort

Is there a missing "and" after comma, or is this the start of a list
that was not completed?

 > +    on table containing 'num_rows' records. If available memory is 
enough to

"on table". Above "source record set" is used. I think same term
should be used in both places.

 > +    store only 'max_rows' sort keys without addon data, it modifies 
'table'

I would say, "is enough to store 'max_rows' sort keys, but not all
addon data"

 > +    to sort with references to tuples instead of additional data. In 
evaluations
 > +    it uses 'param' to get sort key, addon data and reference sizes, 
'table' -
 > +    to get table scan time.
 > +
 > +   @retval
 > +    allocated buffer - if it's ok to use PQ
 > +    NULL - PQ will be slower than merge-sort, or there is not
 > enough memory.

Is this return value according to the coding rules we discussed in
optimizer team?  (return value !=0 means error)

 > +*/
 > +
 > +uchar **check_if_pq_applicable(Sort_param *param, FILESORT_INFO *info,
 > +                               TABLE *table, ha_rows num_rows, ulong 
memavl)
 > +{
 > +  DBUG_ENTER("check_if_pq_applicable");
 > +
 > +  if (param->max_rows == HA_POS_ERROR)
 > +  {
 > +    DBUG_PRINT("info", ("max_rows = HA_POS_ERROR"));
 > +    DBUG_RETURN(NULL);

Can this happen?

 > +  }
 > +
 > +  if (param->max_rows + 2>= UINT_MAX)
 > +  {
 > +    DBUG_PRINT("info", ("Too large LIMIT"));

Please, use a more specific text here.

 > +    DBUG_RETURN(NULL);
 > +  }
 > +
 > +  uchar **sort_keys;

In this context, it will not

Seems to me that the code below would be simpler if if you initialized
this to NULL, and just had a single return at the end.

 > +  ulong num_available_keys= memavl/(param->rec_length + sizeof(char*));
 > +  param->num_keys_per_buffer= (uint) param->max_rows + 1;
 > +  if (num_rows<  num_available_keys)
 > +  {
 > +    /* The whole source set fits into memory. */
 > +    if (param->max_rows<  num_rows/PQ_slowness )

Some comment on PQ_slowness would be good here.

 > +    {
 > +      if ((sort_keys= make_char_array(info->sort_keys,
 > +                                      param->num_keys_per_buffer,
 > +                                      param->rec_length)))
 > +        DBUG_RETURN(sort_keys);
 > +    }
 > +    else
 > +    {
 > +      /* PQ will be slower */
 > +      DBUG_RETURN(NULL);
 > +    }
 > +  }
 > +
 > +  if (param->max_rows + 1<  num_available_keys&&

Why +1?

 > +      (sort_keys= make_char_array(info->sort_keys,
 > +                                  param->num_keys_per_buffer,
 > +                                  param->rec_length)))
 > +    DBUG_RETURN(sort_keys);
 > +
 > +  /* Try to strip off addon fields */
 > +  if (param->addon_field)
 > +  {
 > +    const ulong row_lenght=

Typo: row_lenght

 > +      param->sort_length + param->ref_length + sizeof(char*);

I would like a comment explaining what these 3 parts of the row
represents.

 > +    num_available_keys= memavl / row_lenght;
 > +
 > +    if (param->max_rows + 1<  num_available_keys)
 > +    {
 > +      const double merge_cost=
 > +        get_merge_many_buffs_cost_fast(num_rows / num_available_keys,
 > +                                       num_available_keys,
 > +                                       num_rows % num_available_keys,
 > +                                       row_lenght);

I think the division/remainder logic could be kept within the cost
function.  Is there a need to burden the caller with that.

 > +      const double pq_cost=
 > +        (PQ_slowness * num_rows + param->max_rows + 1)*
 > +        log(param->max_rows + 1.0) / TIME_FOR_COMPARE_ROWID +
 > +        param->max_rows * table->file->scan_time();

I think this cost function needs an explanation.  For one thing, why
is PQ_slowness placed where it is?

 > +
 > +      if (merge_cost<  pq_cost)
 > +        DBUG_RETURN(NULL);
 > +
 > +      sort_keys= make_char_array(info->sort_keys,
 > +                                 param->num_keys_per_buffer,
 > +                                 param->rec_length);
 > +      if (sort_keys)
 > +      {
 > +        // Make attached data to be references instead of fields.
 > +        my_free(info->addon_buf);
 > +        my_free(info->addon_field);
 > +        info->addon_buf= NULL;
 > +        info->addon_field= 0;

Seems a bit inconsistent to set one pointer to NULL and the other to
0.

 > +        param->addon_field= NULL;
 > +        param->addon_length= 0;
 > +
 > +        param->res_length= param->ref_length;
 > +        param->sort_length+= param->ref_length;
 > +        param->rec_length= param->sort_length + param->addon_length;

Why do you add param->addon_length?  It has just been set to 0.

...

 > === modified file 'sql/filesort.h'
 > --- a/sql/filesort.h	2010-07-02 02:58:51 +0000
 > +++ b/sql/filesort.h	2010-10-21 14:12:43 +0000
 > @@ -21,6 +21,7 @@ class SQL_SELECT;
 >   #include "my_global.h"                          /* uint, uchar */
 >   #include "my_base.h"                            /* ha_rows */
 >
 > +class JOIN;

I do not think this is needed.  I see no reference to JOIN in this
file.

 >   class SQL_SELECT;
 >   class THD;
 >   struct TABLE;
 > @@ -29,7 +30,7 @@ typedef struct st_sort_field SORT_FIELD;
 >   ha_rows filesort(THD *thd, TABLE *table, st_sort_field *sortorder,
 >                    uint s_length, SQL_SELECT *select,
 >                    ha_rows max_rows, bool sort_positions,
 > -                 ha_rows *examined_rows);
 > +                 ha_rows *examined_rows, ulonglong options);
 >   void filesort_free_buffers(TABLE *table, bool full);
 >   void change_double_for_sort(double nr,uchar *to);
 >
 >

...

 > === modified file 'sql/sql_select.cc'

...

 > @@ -3249,12 +3253,21 @@ JOIN::exec()
 >   	the query. XXX: it's never shown in EXPLAIN!
 >   	OPTION_FOUND_ROWS supersedes LIMIT and is taken into account.
 >         */
 > -      if (create_sort_index(thd, curr_join,
 > -			    curr_join->group_list ?
 > -			    curr_join->group_list : curr_join->order,
 > -			    curr_join->select_limit,
 > -			    (select_options&  OPTION_FOUND_ROWS ?
 > -			     HA_POS_ERROR : unit->select_limit_cnt),
 > +      DBUG_PRINT("info",("Sorting for order by/group by"));
 > +      ORDER *order_arg=
 > +        curr_join->group_list ? curr_join->group_list : 
curr_join->order;
 > +      const ha_rows filesort_limit_arg=
 > +        (has_group_by || curr_join->tables>  1)
 > +        ? curr_join->select_limit : unit->select_limit_cnt;

I think a comment with an explanation on filesort_limit_arg would be
good.

 > +      const ha_rows select_limit_arg=
 > +        select_options&  OPTION_FOUND_ROWS
 > +        ? HA_POS_ERROR : unit->select_limit_cnt;
 > +
 > +      if (create_sort_index(thd,
 > +                            curr_join,
 > +                            order_arg,
 > +                            filesort_limit_arg,
 > +                            select_limit_arg,
 >                               curr_join->group_list ? FALSE : TRUE))
 >   	DBUG_VOID_RETURN;
 >         sortorder= curr_join->sortorder;

...

 > @@ -18368,8 +18390,26 @@ end_send(JOIN *join, JOIN_TAB *join_tab
 >         error=join->result->send_data(*join->fields);
 >       if (error)
 >         DBUG_RETURN(NESTED_LOOP_ERROR); /* purecov: inspected */
 > -    if (++join->send_records>= join->unit->select_limit_cnt&&
 > -	join->do_send_rows)
 > +
 > +    ++join->send_records;
 > +    if (join->send_records>= join->unit->select_limit_cnt&&
 > +        !join->do_send_rows)
 > +    {
 > +      /*
 > +        If filesort is used for sorting, stop after select_limit_cnt+1
 > +        records are read. Because of optimization in some cases it can
 > +        provide only select_limit_cnt+1 records.
 > +      */
 > +      if (join->order&&
 > +          join->sortorder&&
 > +          join->select_options&  OPTION_FOUND_ROWS)
 > +      {
 > +        DBUG_PRINT("info", ("filesort NESTED_LOOP_QUERY_LIMIT"));
 > +        DBUG_RETURN(NESTED_LOOP_QUERY_LIMIT);
 > +      }
 > +    }
 > +    if (join->send_records>= join->unit->select_limit_cnt&&
 > +        join->do_send_rows)

Seems like things would be a bit simpler with a pattern like this:

   if (join->send_records>= join->unit->select_limit_cnt)
     if (join->do_send_rows)
       ...
     else
       ...

 >       {
 >         if (join->select_options&  OPTION_FOUND_ROWS)
 >         {

...

 > @@ -20118,9 +20158,11 @@ create_sort_index(THD *thd, JOIN *join,
 >
 >     if (table->s->tmp_table)
 >       table->file->info(HA_STATUS_VARIABLE);	// Get record count
 > -  table->sort.found_records=filesort(thd, table,join->sortorder, length,
 > -                                     select, filesort_limit, 0,
 > -&examined_rows);
 > +  table->sort.found_records= filesort(thd, table,join->sortorder, 
length,

Missing space after comma in line above.

 > +                                      select, filesort_limit, 0,
 > +&examined_rows,
 > +                                      join->tables != 1 ? 0 :
 > +                                      join->select_options& 
OPTION_FOUND_ROWS);

Would things go wrong if OPTION_FOUND_ROWS were passed in also when
join->tables != 1?

 >     tab->records= table->sort.found_records;	// For SQL_CALC_ROWS
 >     if (select)
 >     {

...

 > === modified file 'sql/sql_sort.h'
 > --- a/sql/sql_sort.h	2010-07-02 18:15:21 +0000
 > +++ b/sql/sql_sort.h	2010-10-21 14:12:43 +0000
 > @@ -26,7 +26,7 @@ typedef struct st_sort_field SORT_FIELD;
 >
 >   class Field;
 >   struct TABLE;
 > -
 > +class THD;

The only reason THD needed here, seems to be to get the value of
thd->variables.max_length_for_sort_data in a function called
indirectly by init.  Would it not be better to just provide that
information explicitly?

 >
 >   /* Defines used by filesort and uniques */
 >
 > @@ -71,15 +71,17 @@ struct BUFFPEK_COMPARE_CONTEXT
 >     void *key_compare_arg;
 >   };
 >
 > -typedef struct st_sort_param {
 > -  uint rec_length;          /* Length of sorted records */
 > -  uint sort_length;			/* Length of sorted columns */
 > -  uint ref_length;			/* Length of record ref. */
 > -  uint addon_length;        /* Length of added packed fields */
 > -  uint res_length;          /* Length of records in final sorted 
file/buffer */
 > -  uint keys;				/* Max keys / buffer */
 > -  ha_rows max_rows,examined_rows;
 > -  TABLE *sort_form;			/* For quicker make_sortkey */
 > +class Sort_param {
 > +public:
 > +  uint rec_length;            /* Length of sorted records */
 > +  uint sort_length;           /* Length of sorted columns */
 > +  uint ref_length;            /* Length of record ref. */
 > +  uint addon_length;          /* Length of added packed fields */
 > +  uint res_length;            /* Length of records in final sorted 
file/buffer */
 > +  uint num_keys_per_buffer;   /* Max keys / buffer */

num keys or max keys? Name and description does not match.

 > +  ha_rows max_rows;           /* Select limit, or HA_POS_ERROR if 
unlimited */
 > +  ha_rows examined_rows;      /* Number of examined rows */
 > +  TABLE *sort_form;           /* For quicker make_sortkey */
 >     SORT_FIELD *local_sortorder;
 >     SORT_FIELD *end;
 >     SORT_ADDON_FIELD *addon_field; /* Descriptors for companion fields */
 > @@ -89,15 +91,22 @@ typedef struct st_sort_param {
 >     /* The fields below are used only by Unique class */
 >     qsort2_cmp compare;
 >     BUFFPEK_COMPARE_CONTEXT cmp_context;
 > -} SORTPARAM;
 > +
 > +  Sort_param()
 > +  {
 > +    memset(this, 0, sizeof(*this));

This makes this header file not self-contained.  It does not import
the definition of memset, AFAICT.

 > +  }
 > +  void init(uint sortlen, TABLE *table, THD *thd,
 > +            ha_rows maxrows, bool sort_positions);
 > +};
 >
 >
 > -int merge_many_buff(SORTPARAM *param, uchar *sort_buffer,
 > +int merge_many_buff(Sort_param *param, uchar *sort_buffer,
 >   		    BUFFPEK *buffpek,
 >   		    uint *maxbuffer, IO_CACHE *t_file);
 >   uint read_to_buffer(IO_CACHE *fromfile,BUFFPEK *buffpek,
 >   		    uint sort_length);
 > -int merge_buffers(SORTPARAM *param,IO_CACHE *from_file,
 > +int merge_buffers(Sort_param *param,IO_CACHE *from_file,
 >   		  IO_CACHE *to_file, uchar *sort_buffer,
 >   		  BUFFPEK *lastbuff,BUFFPEK *Fb,
 >   		  BUFFPEK *Tb,int flag);

You have fixed similar errors in indentation in other places.  Why not
here?

...

 > === modified file 'sql/uniques.cc'
 > --- a/sql/uniques.cc	2010-07-08 21:42:23 +0000
 > +++ b/sql/uniques.cc	2010-10-21 14:12:43 +0000
 > @@ -577,7 +577,6 @@ bool Unique::walk(tree_walk_action actio
 >
 >   bool Unique::get(TABLE *table)
 >   {
 > -  SORTPARAM sort_param;
 >     table->sort.found_records=elements+tree.elements_in_tree;
 >
 >     if (my_b_tell(&file) == 0)
 > @@ -612,19 +611,20 @@ bool Unique::get(TABLE *table)
 >       return 1;
 >     reinit_io_cache(outfile,WRITE_CACHE,0L,0,0);
 >
 > -  bzero((char*)&sort_param,sizeof(sort_param));
 > +  Sort_param sort_param;
 >     sort_param.max_rows= elements;
 >     sort_param.sort_form=table;
 >     sort_param.rec_length= sort_param.sort_length= sort_param.ref_length=
 >       size;
 > -  sort_param.keys= (uint) (max_in_memory_size / sort_param.sort_length);
 > +  sort_param.num_keys_per_buffer=
 > +    (uint) (max_in_memory_size / sort_param.sort_length);
 >     sort_param.not_killable=1;
 >
 > -  if (!(sort_buffer=(uchar*) my_malloc((sort_param.keys+1) *
 > +  if (!(sort_buffer=(uchar*) 
my_malloc((sort_param.num_keys_per_buffer + 1) *
 >   				       sort_param.sort_length,
 >   				       MYF(0))))
 >       return 1;
 > -  sort_param.unique_buff= sort_buffer+(sort_param.keys*
 > +  sort_param.unique_buff= sort_buffer+(sort_param.num_keys_per_buffer *
 >   				       sort_param.sort_length);
 >
 >     sort_param.compare= (qsort2_cmp) buffpek_compare;

It seems a bit inconsistent to introduce an init-function for
Sort_param and only use it in 1 of 2 places where sort_param is used.

...

 > === added file 'unittest/gunit/bounded_queue-t.cc'
 > --- a/unittest/gunit/bounded_queue-t.cc	1970-01-01 00:00:00 +0000
 > +++ b/unittest/gunit/bounded_queue-t.cc	2010-10-21 14:12:43 +0000
 > @@ -0,0 +1,423 @@
 > +/* Copyright (c) 2010, Oracle and/or its affiliates. All rights 
reserved.
 > +
 > +   This program is free software; you can redistribute it and/or modify
 > +   it under the terms of the GNU General Public License as published by
 > +   the Free Software Foundation; version 2 of the License.
 > +
 > +   This program is distributed in the hope that it will be useful,
 > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
 > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 > +   GNU General Public License for more details.
 > +
 > +   You should have received a copy of the GNU General Public License
 > +   along with this program; if not, write to the Free Software
 > +   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
02111-1307  USA */
 > +
 > +// First include (the generated) my_config.h, to get correct 
platform defines,
 > +// then gtest.h (before any other MySQL headers), to avoid min() 
macros etc ...
 > +#include "my_config.h"
 > +#include<gtest/gtest.h>
 > +#include<algorithm>
 > +#include<stddef.h>
 > +
 > +#include "bounded_queue.h"
 > +#include "my_sys.h"
 > +
 > +namespace {
 > +
 > +const int num_elements= 14;
 > +
 > +// A simple helper function to determine array size.
 > +template<class T, int size>
 > +int array_size(const T (&)[size])
 > +{
 > +  return size;
 > +}
 > +
 > +
 > +/*
 > +  Elements to be sorted by tests below.
 > +  We put some data in front of 'val' to verify (when debugging)
 > +  that all the reinterpret_casts involved when using QUEUE are correct.
 > +*/
 > +struct Test_element
 > +{
 > +  Test_element()      { *this= -1; }
 > +  Test_element(int i) { *this= i; }
 > +
 > +  Test_element&operator=(int i)
 > +  {
 > +    val= i;
 > +    snprintf(text, array_size(text), "%4d", i);
 > +    return *this;
 > +  }
 > +
 > +  char text[8]; // Some data.
 > +  int  val;     // The value we use for generating the key.
 > +};
 > +
 > +
 > +/*
 > +  The key, which is actually sorted by queue_xxx() functions.
 > +  We sort on the key only.
 > + */
 > +struct Test_key
 > +{
 > +  Test_key() : element(NULL), key(-1) {}
 > +
 > +  Test_element *element; // The actual data element.
 > +  int           key;     // The generated key for the data element.
 > +};
 > +
 > +
 > +/*
 > +  Comparison function for Test_key objects.
 > + */
 > +extern "C"

Declaring it as extern "C" gives a warning (Anachronism) with SUN
compiler.  I guess it is because Bounded_queue does not declare the
compare function as extern "C".

 > +int test_key_compare(size_t *cmp_arg, Test_key **a, Test_key **b)
 > +{
 > +  EXPECT_EQ(*cmp_arg, sizeof(int));
 > +
 > +  int a_num= (*a)->key;
 > +  int b_num= (*b)->key;
 > +
 > +  if (a_num>  b_num)
 > +    return +1;
 > +  if (a_num<  b_num)
 > +    return -1;
 > +  return 0;
 > +}
 > +
 > +
 > +/*
 > +  Generates a Test_key for a given Test_element.
 > + */
 > +void test_keymaker(Sort_param *sp, Test_key *key, Test_element *element)

I guess making a unit test that actually uses Sort_param would be a
bit awkward?

 > +{
 > +  int key_val= element->val;

Why do you need this local variable?

 > +
 > +  key->element= element;
 > +  key->key= key_val;
 > +}
 > +
 > +
 > +/*
 > +  A struct to wrap the actual keys, and an array of pointers to the 
keys.
 > + */
 > +template<int sz, typename Key_type>
 > +struct Key_container
 > +{
 > +  Key_container()
 > +  {
 > +    for (int ix= 0; ix<= sz; ++ix)
 > +      key_ptrs[ix]=&key_data[ix];
 > +  }
 > +
 > +  Key_type *key_ptrs[sz+1];
 > +  Key_type  key_data[sz+1];
 > +};
 > +
 > +
 > +class BoundedQueueTest : public ::testing::Test
 > +{
 > +protected:
 > +  BoundedQueueTest() : m_key_size(sizeof(int))
 > +  {
 > +  }
 > +
 > +  virtual void SetUp()
 > +  {
 > +    int ix;
 > +    for (ix=0; ix<  array_size(m_test_data); ++ix)
 > +      m_test_data[ix]= ix;
 > + 
std::random_shuffle(&m_test_data[0],&m_test_data[array_size(m_test_data)]);
 > +  }
 > +
 > +  // Key pointers and data, used by the queue_xxx() functions.
 > +  Key_container<num_elements / 2, Test_key>  m_keys;
 > +
 > +  // Some random intput data, to be sorted.
 > +  Test_element  m_test_data[num_elements];
 > +
 > +  size_t m_key_size;
 > +  Bounded_queue<Test_element, Test_key>  m_queue;
 > +private:
 > +  GTEST_DISALLOW_COPY_AND_ASSIGN_(BoundedQueueTest);
 > +};
 > +
 > +
 > +// Google Test recommends DeathTest suffix for classes used in death 
tests.
 > +typedef BoundedQueueTest BoundedQueueDeathTest;
 > +
 > +#if !defined(DBUG_OFF)
 > +/*
 > +  Verifies that we DBUG_ASSERT if trying to push to an 
un-initialized queue.
 > + */
 > +TEST_F(BoundedQueueDeathTest, die_if_not_initialized)
 > +{
 > +  ::testing::FLAGS_gtest_death_test_style = "threadsafe";
 > +  Test_element foo= 1;
 > +  EXPECT_DEATH_IF_SUPPORTED(m_queue.push(&foo),
 > +                            ".*Assertion .*is_initialized.*");
 > +}
 > +#endif  // !defined(DBUG_OFF)
 > +
 > +
 > +/*
 > +  Verifies that construct, initialize, destroy works.
 > + */
 > +TEST_F(BoundedQueueTest, construct_and_destruct)
 > +{
 > +  EXPECT_EQ(0, m_queue.init(num_elements/2, 0, TRUE,
 > +                            test_key_compare,
 > +                            m_key_size,
 > +&test_keymaker, NULL, m_keys.key_ptrs));
 > +}
 > +
 > +
 > +/*
 > +  Verifies that we reject too large queues.
 > + */
 > +TEST_F(BoundedQueueTest, too_many_elements)
 > +{
 > +  EXPECT_EQ(1, m_queue.init(UINT_MAX, 0, TRUE,
 > +                            test_key_compare,
 > +                            m_key_size,
 > +&test_keymaker, NULL, m_keys.key_ptrs));
 > +  EXPECT_EQ(1, m_queue.init(UINT_MAX - 1, 0, TRUE,
 > +                            test_key_compare,
 > +                            m_key_size,
 > +&test_keymaker, NULL, m_keys.key_ptrs));
 > +}
 > +
 > +
 > +/*
 > +  Verifies that push and bounded size works, and that pop() gives 
sorted order.
 > + */
 > +TEST_F(BoundedQueueTest, push_and_pop)
 > +{
 > +  EXPECT_EQ(0, m_queue.init(num_elements/2, 0, TRUE, test_key_compare,
 > +                            m_key_size,
 > +&test_keymaker, NULL, m_keys.key_ptrs));
 > +  for (int ix= 0; ix<  array_size(m_test_data); ++ix)
 > +  {
 > +    m_queue.push(&m_test_data[ix]);
 > +  }
 > +  /*
 > +   The queue should now contain [0 .. (num_elements/2 - 1)]
 > +   plus one extra element that we are not interested in.

I do not think this "not interesting" aspect of Bounded_queue is
documented.  Seems a bit awkward.

 > +  */
 > +  (void) m_queue.pop();
 > +  int expected_key_val= m_queue.num_elements() - 1;
 > +  while (m_queue.num_elements()>  0)
 > +  {
 > +    Test_key **top= m_queue.pop();
 > +    int key_val= (*top)->key;
 > +    EXPECT_EQ(expected_key_val, key_val);
 > +    Test_element *element= (*top)->element;
 > +    EXPECT_EQ(expected_key_val, element->val);
 > +    --expected_key_val;
 > +  }
 > +}
 > +
 > +
 > +/*
 > +  Verifies that push, with bounded size, followed by sort() works.
 > + */
 > +TEST_F(BoundedQueueTest, insert_and_sort)
 > +{
 > +  EXPECT_EQ(0, m_queue.init(num_elements/2, 0, TRUE, test_key_compare,
 > +                            m_key_size,
 > +&test_keymaker, NULL, m_keys.key_ptrs));
 > +  for (int ix= 0; ix<  array_size(m_test_data); ++ix)
 > +  {
 > +    m_queue.push(&m_test_data[ix]);
 > +  }
 > +
 > +  uchar *base=  (uchar*)&m_keys.key_ptrs[0];
 > +  uint   items= m_queue.num_elements();
 > +  size_t size=  sizeof(Test_key);
 > +  // We sort our keys as strings, so erase all the element pointers 
first.
 > +  for (int ii= 0; ii<  array_size(m_keys.key_data); ++ii)
 > +    m_keys.key_data[ii].element= NULL;
 > +
 > +  my_string_ptr_sort(base, items, size);
 > +  for (int ii= 0; ii<  num_elements/2; ++ii)
 > +  {
 > +    Test_key *sorted_key= m_keys.key_ptrs[ii];
 > +    EXPECT_EQ(ii, sorted_key->key);
 > +  }
 > +}
 > +
 > +
 > +/*
 > +  A test of the function get_merge_many_buffs_cost_fast()
 > + */
 > +TEST(CostEstimationTest, merge_many_buff)
 > +{
 > +  ha_rows num_rows= 512;
 > +  ulong num_keys= 100;
 > +  ulong row_lenght= 100;
 > +  double prev_cost= 0.0;
 > +  while (num_rows<= MAX_FILE_SIZE/4)
 > +  {
 > +    double merge_cost=
 > +      get_merge_many_buffs_cost_fast(num_rows / num_keys,
 > +                                     num_keys,
 > +                                     num_rows % num_keys,
 > +                                     row_lenght);
 > +    EXPECT_LT(0.0, merge_cost);
 > +    EXPECT_LT(prev_cost, merge_cost);
 > +    num_rows*= 2;
 > +    prev_cost= merge_cost;
 > +  }
 > +}
 > +

It seems that this just tests that the cost is positive and
monotonically increasing.  Would it be possible in some other way to
test that the computed cost is actually reasonable?

 > +
 > +/*
 > +  Comparison function for integers.
 > + */
 > +extern "C"

Also gives warning.

 > +int int_ptr_compare(size_t *cmp_arg, int **a, int **b)
 > +{
 > +  EXPECT_EQ(*cmp_arg, sizeof(int));
 > +
 > +  int a_num= **a;
 > +  int b_num= **b;
 > +
 > +  if (a_num>  b_num)
 > +    return +1;
 > +  if (a_num<  b_num)
 > +    return -1;
 > +  return 0;
 > +}
 > +
 > +
 > +/*
 > +  Generates an integer key for a given integer element.
 > + */
 > +void int_keymaker(Sort_param *sp, int *to, int *from)
 > +{
 > +  memcpy(to, from, sizeof(int));
 > +}
 > +
 > +
 > +/*
 > +  Some basic performance testing, to compute the overhead of 
Bounded_queue.
 > +  Run the with 'bounded_queue-t --disable-tap-output' to see the
 > +  millisecond output from Google Test.
 > + */

I wonder whether it would make sense to have a separate file for this
performance test.

 > +const int num_rows= 10000;
 > +const int row_limit= 100;
 > +const int num_iterations= 1000;
 > +
 > +class PerfTestSmall : public ::testing::Test
 > +{
 > +public:
 > +  /*
 > +    The extra overhead of malloc/free should be part of the measurement,
 > +    so we do not define the key container as a member here.
 > +  */
 > +  typedef Key_container<row_limit, int>  Container;
 > +  enum { limit= row_limit };
 > +};
 > +
 > +class PerfTestLarge : public ::testing::Test
 > +{
 > +public:
 > +  /*
 > +    The extra overhead of malloc/free should be part of the measurement,
 > +    so we do not define the key container as a member here.
 > +  */
 > +  typedef Key_container<num_rows, int>  Container;
 > +  enum { limit= num_rows };
 > +};
 > +
 > +
 > +/*
 > +  Test with Bounded_queue size ==<limit>.
 > + */
 > +TEST_F(PerfTestSmall, insert_and_sort)
 > +{
 > +  for (int it= 0; it<  num_iterations; ++it)
 > +  {
 > +    Container *keys= new Container;
 > +    srand(0);
 > +    Bounded_queue<int, int>  queue;
 > +    EXPECT_EQ(0, queue.init(limit, 0, TRUE, int_ptr_compare,
 > +                            sizeof(int),&int_keymaker, NULL, 
keys->key_ptrs));
 > +    for (int ix= 0; ix<  limit; ++ix)
 > +    {
 > +      int data= rand();
 > +      queue.push(&data);
 > +    }
 > +    my_string_ptr_sort((uchar*)&keys->key_ptrs[0],
 > +                       queue.num_elements(), sizeof(int));
 > +    delete keys;
 > +  }
 > +}
 > +
 > +
 > +/*
 > +  Test with Bounded_queue size ==<number of rows>
 > + */
 > +TEST_F(PerfTestLarge, insert_and_sort)
 > +{
 > +  for (int it= 0; it<  num_iterations; ++it)
 > +  {
 > +    Container *keys= new Container;
 > +    srand(0);
 > +    Bounded_queue<int, int>  queue;
 > +    EXPECT_EQ(0, queue.init(limit, 0, TRUE, int_ptr_compare,
 > +                            sizeof(int),&int_keymaker, NULL, 
keys->key_ptrs));
 > +    for (int ix= 0; ix<  limit; ++ix)
 > +    {
 > +      int data= rand();
 > +      queue.push(&data);
 > +    }
 > +    my_string_ptr_sort((uchar*)&keys->key_ptrs[0],
 > +                       queue.num_elements(), sizeof(int));
 > +    delete keys;
 > +  }
 > +}
 > +

You have two identical code parts above.  Maybe this could be made
into a separate function?

 > +
 > +/*
 > +  Test without bounded queue, i.e. insert keys into array, and sort it.
 > + */
 > +TEST_F(PerfTestLarge, without_queue)
 > +{
 > +  for (int it= 0; it<  num_iterations; ++it)
 > +  {
 > +    Container *keys= new Container;
 > +    srand(0);
 > +    for (int ix= 0; ix<  limit; ++ix)
 > +    {
 > +      int data= rand();
 > +      keys->key_data[ix]= data;
 > +    }
 > +    my_string_ptr_sort((uchar*)&keys->key_ptrs[0], limit, sizeof(int));
 > +    delete keys;
 > +  }
 > +}
 > +
 > +
 > +/*
 > +  Computes the overhead of setting up sort arrays, and rand() calls.
 > + */
 > +TEST_F(PerfTestLarge, no_sorting)
 > +{
 > +  for (int it= 0; it<  num_iterations; ++it)
 > +  {
 > +    Container *keys= new Container;
 > +    srand(0);
 > +    for (int ix= 0; ix<  limit; ++ix)
 > +    {
 > +      int data= rand();
 > +      keys->key_data[ix]= data;
 > +    }
 > +    delete keys;
 > +  }
 > +}
 > +
 > +}  // namespace
 >
 >
 >
 >
 >

-- 
Øystein
Thread
bzr commit into mysql-next-mr-bugfixing branch (tor.didriksen:3236) WL#1393Tor Didriksen11 Nov
  • Re: bzr commit into mysql-next-mr-bugfixing branch (tor.didriksen:3236)WL#1393Øystein Grøvlen19 Nov
    • Re: bzr commit into mysql-next-mr-bugfixing branch (tor.didriksen:3236)WL#1393Tor Didriksen24 Nov
      • Re: bzr commit into mysql-next-mr-bugfixing branch (tor.didriksen:3236)WL#1393Jorgen Loland25 Nov
      • Re: bzr commit into mysql-next-mr-bugfixing branch (tor.didriksen:3236)WL#1393Øystein Grøvlen26 Nov
        • Re: bzr commit into mysql-next-mr-bugfixing branch (tor.didriksen:3236)WL#1393Tor Didriksen2 Dec
          • Re: bzr commit into mysql-next-mr-bugfixing branch (tor.didriksen:3236)WL#1393Øystein Grøvlen2 Dec
            • Re: bzr commit into mysql-next-mr-bugfixing branch (tor.didriksen:3236)WL#1393Øystein Grøvlen3 Dec