List:Commits« Previous MessageNext Message »
From:Tor Didriksen Date:November 24 2010 3:36pm
Subject:Re: bzr commit into mysql-next-mr-bugfixing branch (tor.didriksen:3236)
WL#1393
View as plain text  
On 2010-11-19 12:59, Øystein Grøvlen wrote:
> Hi Tor,
>
> Thanks for the patch.
> I am sorry that I have used a bit long time on this review.
>
> Most of my comments are in-line.  In addition, I have some question on 
> testing:
>
>   - Will the existence of indexes matter for this feature?  Should
>     that be tested in some way?

The optimizer tries to use indexes in order to avoid filesort.


>   - All your tests seem to be on data that is partially sorted. Is
>     that an issue?

I don't think it is an issue, I can scramble the inserts if you like.
Note that there are *a lot* more tests of filesort in other mtr tests
(that's why I implemented --explain-protocol for mtr)


>   - Would it be an idea to have a unit test that is closer to the
>     actual usage in the MySQL code?  E.g., operates on byte arrays,
>     uses the same comparison function.  That could test some of the
>     concepts even if the actual code is not tested

I will see how much extra effort that would be.
(I agree that it would be a good idea).


>
> >       @ sql/bounded_queue.cc
> >          Since Bounded_queue is templatized, the entire 
> implementation must go in the .h file
> >          The free funciton get_merge_many_buffs_cost_fast is here so 
> that we can unit test it.
>
> Since the entire implementation of Bonded_queue is in the header file,
> I see no reason to call the above file bounded_queue.cc.  I suggest
> rather let the name reflect what it contains.  (Another alternative is
> to move the entire content to filesort.h. That file does not seem to
> contain stuff that complicates a unit test.)
>
> NB! Typo: funciton

Yes, this filename is a bit misleading now that Bounded_queue has been
templatized and moved to the .h file.
So, should I create a new pair of .h and .cc files, just for the cost 
function?


>
> >       @ sql/bounded_queue.h
> >          This is a wrapper on top of QUEUE and the queue_xxx() 
> functions.
> >       @ sql/filesort.cc
> >          Add support for Priority Queue sort for queries with a LIMIT.
> >          Remove dead code.
>
> I think you should be a bit more specific here.  For the PQ things,
> one can probably get the necessary info from the worklog, but for
> other changes a bit more detail in the commit comments would make it
> easier to understand why the changes were done (e.g. why the code is
> dead).

OK, some more commit comments.


> > +
> > +################
> > +## Test sort when source data fits to memory
>
> "fits in memory"?

yes

>
> > +
> > +SELECT * FROM t1 ORDER BY f1 ASC, f0 LIMIT 100;
> > +SELECT * FROM t1 ORDER BY f1 ASC, f0 LIMIT 30;
> > +SELECT * FROM t1 ORDER BY f1 ASC, f0 LIMIT 0;
> > +SELECT * FROM t1 ORDER BY f2 DESC, f0 LIMIT 30;
> > +SELECT * FROM t1 ORDER BY f2 DESC, f0 LIMIT 0;
> > +SELECT * FROM t1 WHERE f1>10 ORDER BY f2, f0 LIMIT 20;
> > +SELECT * FROM t1 WHERE f1>10 ORDER BY f2, f0 LIMIT 0;
> > +SELECT * FROM t1 WHERE f1>10 ORDER BY f2, f0 LIMIT 10 OFFSET 10;
> > +SELECT * FROM t1 WHERE f1>10 ORDER BY f2, f0 LIMIT 0 OFFSET 10;
>
> I assume the reason there are no EXPLAIN queries is that the explain
> output does not reflect whether a PQ is used.

I implemented EXPLAIN, but Evgeny (as architect reviwer) wanted me to
remove it.


> > +
> > +################
> > +## Test with SQL_CALC_FOUND_ROWS
>
> Is it intentional that these tests are run with a small sort buffer?

Yes (made that a bit more explicit)


>
>
> > +################
> > +## Test with subqueries
> > +SELECT d1.* FROM t1
> > +LEFT JOIN (SELECT * FROM t1 LIMIT 30) d1 on t1.f1=d1.f1
> > +ORDER BY d1.f2 DESC LIMIT 30;
>
> I do not think the result of the above query is well defined since the
> contents of the derived table, d1, is not explicitly ordered.

???

>
> > +
> > +SELECT * FROM t1 WHERE f1 = (SELECT f1 FROM t1 ORDER BY 1 LIMIT 1);
> > +
> > +--error ER_SUBQUERY_NO_1_ROW
> > +SELECT * FROM t1 WHERE f1 = (SELECT f1 FROM t1 ORDER BY 1 LIMIT 2);
> > +
> > +DROP TABLE t1, tmp;
> > +DROP VIEW v1, v2;
> > +
> > +--echo # end of WL#1393 - Optimizing filesort with small limit
> >
>
> ...
>
> > === added file 'mysql-test/t/order_by_sortkey.test'
>
> I do not quite understand the reasoning behind the name here.  Is the
> idea that it tests the case where only the sort keys, and not addon
> fields, fits in memory?  If you cannot come up with a better name, at
> least add an explanation at the start of the file.

Added comment.

>
> > --- a/mysql-test/t/order_by_sortkey.test    1970-01-01 00:00:00 +0000
> > +++ b/mysql-test/t/order_by_sortkey.test    2010-10-21 14:12:43 +0000
> > @@ -0,0 +1,64 @@
> > +#
> > +# WL#1393 - ORDER BY with LIMIT tests
> > +#
> > +# A big test in a separate file, so that we can run the original
> > +# order_by test with --debug without timeout.
>
> Is this test so big that it should be defined as a big test?
>

No I don't think so, it takes about 5 seconds on a dbg build with --mem
With --mem --debug, it takes 120 seconds,
and produces 3,5G mysqld.1.trace
With --debug, without --mem it takes about 200 seconds on my desktop.



> > +
> > +# Test when only sortkeys fits to memory
> > +set sort_buffer_size= 32768;
>
> Why is sort_buffer_size set in the middle of the insert statements?
>

No good reason.


> > +INSERT INTO t1(f1,f2) SELECT * FROM tmp;
> > +INSERT INTO tmp SELECT f1,f2 FROM t1;
> > +INSERT INTO t1(f1,f2) SELECT * FROM tmp;
> > +INSERT INTO tmp SELECT f1,f2 FROM t1;
> > +INSERT INTO t1(f1,f2) SELECT * FROM tmp;
> > +INSERT INTO tmp SELECT f1,f2 FROM t1;
> > +INSERT INTO t1(f1,f2) SELECT * FROM tmp;
> > +INSERT INTO tmp SELECT f1,f2 FROM t1;
> > +INSERT INTO t1(f1,f2) SELECT * FROM tmp;
> > +INSERT INTO tmp SELECT f1,f2 FROM t1;
> > +INSERT INTO t1(f1,f2) SELECT * FROM tmp;
> > +INSERT INTO tmp SELECT f1,f2 FROM t1;
> > +INSERT INTO t1(f1,f2) SELECT * FROM tmp;
> > +
> > +FLUSH STATUS;
> > +SHOW SESSION STATUS LIKE 'Sort%';
> > +
> > +SELECT * FROM t1 ORDER BY f2 LIMIT 100;
>
> I notice that if I run this query without your new feature, I get a
> different ordering of rows, but maybe this is unavoidable.  In order
> to test what you are actually testing, you will need to select columns
> that are not ordered?

???

>
> > +
> > +SHOW SESSION STATUS LIKE 'Sort%';
> > +
> > +DROP TABLE t1, tmp;
> >
>
> ...
>
> > === added file 'sql/bounded_queue.cc'
> > --- a/sql/bounded_queue.cc    1970-01-01 00:00:00 +0000
> > +++ b/sql/bounded_queue.cc    2010-10-21 14:12:43 +0000
> > @@ -0,0 +1,67 @@
> > +/* Copyright (c) 2010, Oracle and/or its affiliates. All rights 
> reserved.
> > +
> > +   This program is free software; you can redistribute it and/or 
> modify
> > +   it under the terms of the GNU General Public License as 
> published by
> > +   the Free Software Foundation; version 2 of the License.
> > +
> > +   This program is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +   GNU General Public License for more details.
> > +
> > +   You should have received a copy of the GNU General Public License
> > +   along with this program; if not, write to the Free Software
> > +   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
> 02111-1307  USA */
> > +
> > +#include "bounded_queue.h"
>
> Why include bounded_queue.h? It does not seem to be needed.

Declaration, and documentation, is in the header file.


>
> > +#include "sql_const.h"
> > +#include "sql_sort.h"
> > +
> > +
>
> I miss documentation for this function, and an examplanation of the
> theory behind the calculations.
>
> > +double get_merge_many_buffs_cost_fast(ha_rows num_buffers, ha_rows 
> max_n_elems,
> > +                                      ha_rows last_n_elems, uint 
> elem_size)
> > +{
>
> There is another function get_merge_many_buffs_cost, that seems to
> compute something that is not quite the same.  I think you should use
> another name here.  This function seems to include not just the cost
> of merging buffers, but also the cost of sorting.
>

get_merge_many_buffs_cost_fast() is a simplified, and faster, version
of get_merge_many_buffs_cost(). I added a comment about that.


> > +  double total_cost;
> > +  const double MERGEBUFF_double= (double) MERGEBUFF;
> > +
> > +  /* Calc CPU cost of sorting each buffer */
>
> Each buffer?  Seems more like the cost of sorting all buffers.

OK

>
> > +  total_cost= ( num_buffers * max_n_elems * log(1.0 + max_n_elems) +
> > +                last_n_elems * log(1.0 + last_n_elems) )
> > +    / TIME_FOR_COMPARE_ROWID;
>
> Why do we add 1.0 in the formula above?  If max_n_elems is 1, I would
> assume the cost to sort was 0.

I guess I got last_n_elems==0, and got NaN, so I added 1.0
to both arguments to log()
I guess I would need special handling of max_n_elems == 0, if I remove 1.0


>
> > +
> > +  /* Simulate behavior of merge_many_buff(). */
> > +  while (num_buffers>= MERGEBUFF2)
> > +  {
> > +    /* Calculate # of calls to merge_buffers() */
> > +    ha_rows loop_limit= num_buffers - MERGEBUFF*3/2;
>
> loop_limit is a bit too general to me.  Would it be possible to come
> up with a name that better describes what it represents?

Compare with the function we are simulating: merge_many_buff()
for (i=0 ; i <= *maxbuffer-MERGEBUFF*3/2 ; i+=MERGEBUFF)
                 ^^^^^^^^^^^^^^^^^^^^^^^^

>
> > +    ha_rows num_merge_calls= 1 + loop_limit/MERGEBUFF;
> > +    ha_rows i_end_value= num_merge_calls * MERGEBUFF;
> > +
> > +    /* Cost of merge sort 'num_merge_calls' */
> > +    total_cost+= num_merge_calls *
> > +      ( 2.0 * (max_n_elems * MERGEBUFF_double * elem_size) / IO_SIZE
> > +        + max_n_elems * MERGEBUFF_double * log(MERGEBUFF_double) /
> > +          (TIME_FOR_COMPARE_ROWID * M_LN2));
> > +
> > +    /* # of records in last buffer */
> > +    last_n_elems+= (num_buffers - i_end_value) * max_n_elems;
> > +
> > +    /* cost of merge sort of last buffer */
> > +    total_cost+= 2.0 * ((double)last_n_elems * elem_size) / IO_SIZE
> > +      + double(last_n_elems) * log(MERGEBUFF_double) /
> > +        (TIME_FOR_COMPARE_ROWID * M_LN2);
> > +
> > +    num_buffers= num_merge_calls;
> > +    max_n_elems*= MERGEBUFF;
> > +  }
> > +
>
> It seems to me that it should be possible to avoid this loop, but it
> will probably require some complex expressions involving logarithms to
> get the exact same estimate. But how precise do we really need to
> be here?  In the end, we are going to compare this to a number that
> are based on pretty coarse heuristics (PQ_slowness).
>
> Will it not be possible to come up with a good-enough estimate that
> involves a bit less computation?

I guess it is possible.
The number of iterations through this loop is < 20,
so it doesn't cost much to keep the "simulate the real sort" approach.


>
> > +  /* Simulate final merge_buff call. */
> > +  last_n_elems+= max_n_elems*num_buffers;
> > +  total_cost+= 2.0 * ((double)last_n_elems * elem_size) / IO_SIZE
> > +    + double(last_n_elems) * log(num_buffers + 1.0) /
> > +      (TIME_FOR_COMPARE_ROWID * M_LN2);
>
> The same formula seems to be used several times here.  I think it
> should be factored into a separate function.

OK


>
> > +/**
> > +  A priority queue with a fixed, limited size.
> > +
> > +  This is a wrapper on top of QUEUE and the queue_xxx() functions.
> > +  It keeps the top N elements which are inserted.
>
> I think it should be explained a bit what is meant by top N here.
> I also think it should be described what Element_type and Key_type
> represents, and how they relate.

OK, tried to improve documentation.


>
> > + */
> > +template<typename Element_type, typename Key_type>
> > +class Bounded_queue
> > +{
> > +public:
> > +  Bounded_queue()
> > +  {
> > +    memset(&m_queue, 0, sizeof(m_queue));
> > +  }
> > +
> > +  ~Bounded_queue()
> > +  {
> > +    delete_queue(&m_queue);
> > +  }
> > +
> > +  /**
> > +     Function for making sort-key from input data.
> > +     @param param Sort parameters.
> > +     @param to    Where to put the key.
> > +     @param from  The input data.
> > +  */
> > +  typedef void (*keymaker_function)(Sort_param *param,
> > +                                    Key_type *to,
> > +                                    Element_type *from);
> > +
> > +  /**
> > +     Function for comparing two keys.
> > +     @param  n Pointer to number of bytes to compare.
> > +     @param  a First key.
> > +     @parm   b Second key.
> > +     @retval a<=>  b.
>
> Is it well known what <=> is supposed to mean?  At least, this is a
> different meaning from the MySQL <=> operator.  Hence, I think this
> should be more explicit.

OK, grabbed documentation string from 'man perlop'


>
> > +   */
> > +  typedef int (*compare_function)(size_t *v, Key_type **a, Key_type 
> **b);
> > +
> > +  /**
> > +    Initialize the queue.
> > +
> > +    @param max_elements   The size of the queue.
> > +    @param offset_to_key  Offset to key in elements stored in the 
> queue.
>
> When is this needed?  So far, it does not seem to be used.  Do we
> really need to expose this?  If yes, I think a better explanations is
> needed, and the use of it should also be included in a unit test.

Maybe remove it now, and re-introduce later if we want to
use Bounded_queue for BUFFPEK sorting? (in filesort.cc)


>
> > +    @param max_at_top     Set to 1 if you want biggest element on top.
>
> Why is this parameter of type pbool?  This is as far as I can tell,
> the first use of pbool in the entire sql directory.  Why not use an
> ordinary bool here?  What the underlying implementation use, should
> not be an argument for what is exposed in the API.

good point.


>
> > +    @param compare        Compare function for elements, takes 3 
> arguments.
> > +                          If NULL, we use 
> get_ptr_compare(compare_length).
> > +    @parem compare_length Lenght of the data used for sorting.
>
> It is a bit unclear to me what "length of the data" really means.  Are
> we talking about the length of the keys to be compared, the length of
> each record to be sorted, the length of the entire data set?

Tried to clarify.


>
> Typo: @parem
> Typo: Lenght
>
>
> > +    @param keymaker       Function which generates keys for elements.
> > +    @param sort_param     Sort parameters.
> > +    @param sort_keys      Array of pointers to keys to sort.
> > +
> > +    @retval 0 OK, 1 Could not allocate memory.
> > +
> > +    We do *not* take ownership of any of the input pointer arguments.
> > +   */
> > +  int init(ha_rows max_elements, uint offset_to_key, pbool max_at_top,
> > +           compare_function compare, size_t compare_length,
> > +           keymaker_function keymaker, Sort_param *sort_param,
> > +           Key_type **sort_keys);
> > +
> > +  /**
> > +    Pushes an element on the queue.
> > +    If the queue is already full, we discard the smallest (or
> > biggest) element.
>
> Maybe you should rather define "top-N" and stick with that.  "smallest
> (or biggest)" only spreads FUD ;-)

Tried to clarify.


>
> > +
> > +    @param element        The element to be pushed.
> > +   */
> > +  void push(Element_type *element);
> > +
> > +  /**
> > +    Removes an element from the queue.
>
> Any element? Maybe it should say top element or something.

Tried to clarify.


>
> > +
> > +    @retval Pointer to the removed element.
>
> But what is returned is Key_type, not Element_Type.

Tried to clarify.


>
> > +   */
> > +  Key_type **pop()
> > +  {
> > +    return (Key_type**) queue_remove(&m_queue, 0);
> > +  }
> > +
> > +  /**
> > +    The number of elements in the queue.
> > +   */
> > +  uint num_elements() const { return m_queue.elements; }
> > +
> > +  /**
> > +    Is the queue initialized?
> > +   */
> > +  bool is_initialized() const { return m_queue.max_elements>  0; }
> > +
> > +private:
> > +  Key_type         **m_sort_keys;
> > +  size_t             m_compare_length;
> > +  keymaker_function  m_keymaker;
> > +  Sort_param        *m_sort_param;
> > +  st_queue           m_queue;
> > +};
> > +
> > +
> > +template<typename Element_type, typename Key_type>
> > +int Bounded_queue<Element_type, Key_type>::init(ha_rows max_elements,
> > +                                                uint offset_to_key,
> > +                                                pbool max_at_top,
> > +                                                compare_function 
> compare,
> > +                                                size_t compare_length,
> > +                                                keymaker_function 
> keymaker,
> > +                                                Sort_param 
> *sort_param,
> > +                                                Key_type **sort_keys)
> > +{
> > +  DBUG_ASSERT(sort_keys != NULL);
> > +
> > +  m_sort_keys=      sort_keys;
> > +  m_compare_length= compare_length;
> > +  m_keymaker=       keymaker;
> > +  m_sort_param=     sort_param;
> > +  // init_queue() takes an uint, and also does (max_elements + 1)
> > +  if (max_elements>= (UINT_MAX - 1))
> > +    return 1;
> > +  if (compare == NULL)
> > +    compare= 
> reinterpret_cast<compare_function>(get_ptr_compare(compare_length));
> > +  // We allocate space for one extra element, for replace when 
> queue is full.
> > +  return init_queue(&m_queue, (uint) max_elements + 1, 
> offset_to_key, max_at_top,
> > + reinterpret_cast<queue_compare>(compare),&m_compare_length);
> > +}
>
> I do not quite understand how using a default compare function with
> a different number of arguments may work.  According to the
> documentation for st_queue, the compare function should take 3
> arguments.
>

get_ptr_compare() returns a pointer to a compare function.


> > +
> > +
> > +template<typename Element_type, typename Key_type>
> > +void Bounded_queue<Element_type, Key_type>::push(Element_type 
> *element)
> > +{
> > +  DBUG_ASSERT(is_initialized());
> > +  if (queue_is_full((&m_queue)))
> > +  {
> > +    Key_type **pq_top= reinterpret_cast<Key_type 
> **>(queue_top(&m_queue));
> > +    (*m_keymaker)(m_sort_param, *pq_top, element);
> > +    queue_replaced(&m_queue);
>
> Should not the element be discarded if the key is greater than all
> elements currently present?

I don't want to make any assumtions on the implementation of QUEUE.
Eeasier to just let queue_replaced() handle the comparison,
in case the new element should be discarded rather than inserted.


>
> > +  } else {
> > +    (*m_keymaker)(m_sort_param, m_sort_keys[m_queue.elements], 
> element);
> > +    queue_insert(&m_queue,
> > + reinterpret_cast<uchar*>(&m_sort_keys[m_queue.elements]));
> > +  }
> > +}
> > +
> > +
> > +/*
> > +  Calculate cost of merge sort
> > +
> > +    @param num_buffers    # of merge buffers
> > +    @param max_n_elems    max # of keys in memory buffer
> > +    @param last_n_elems   # of keys in last buffer
> > +    @param elem_size      Size of each element.
> > +
> > +    Calculates cost of merge sort by simulating call to 
> merge_many_buff().
> > +
> > +  @retval
> > +    computed cost of merge sort
>
> What is the unit for this cost value?

furlongs per fortnight


>
> > +
> > +  @note
> > +    Declared here in order to be able to unit test it,
> > +    since library dependencies have not been sorted out yet.
> > +*/
> > +double get_merge_many_buffs_cost_fast(ha_rows num_buffers, ha_rows 
> max_n_elems,
> > +                                      ha_rows last_n_elems, uint 
> elem_size);
> > +
> > +#endif  // BOUNDED_QUEUE_INCLUDED
> >
> > === modified file 'sql/filesort.cc'
> > --- a/sql/filesort.cc    2010-08-04 10:34:01 +0000
> > +++ b/sql/filesort.cc    2010-10-21 14:12:43 +0000
> > @@ -32,11 +32,27 @@
> >   #include "probes_mysql.h"
> >   #include "sql_test.h"                           // TEST_filesort
> >   #include "opt_range.h"                          // SQL_SELECT
> > +#include "bounded_queue.h"
> > +#include "sql_select.h"
> >
> >   #ifndef THREAD
> >   #define SKIP_DBUG_IN_FILESORT
> >   #endif
> >
> > +/*
> > +  How much Priority Queue sort is slower than qsort.
> > +  For qsort we have an average case complexity of O(n*log(n)) 
> comparisons.
> > +  When using a priority queue, we have log(n) comparisons each time we
> > +  push a new element. For PQ we also call qsort at the end.
>
> But this qsort at the end is for a smaller number of elements so I do
> not follow the logic here.

That depends on the number of records in the table, and the LIMIT.
If num_records == limit, I would expect PQ to be roughly twice as
expensive.
First n*log(n) comparisons when inserting, then n*log(n) comparisons
for qsort.

Seems like my comment is only confusing.
Should I simply say 3.0 "based on measurements"?


>
> > +  Measurements show that there is some extra overhead in QUEUE,
> > +  so we set it to 3.0 rather than 2.0.
>
> I can accept that 3.0 is a good value, but I do not understand how 2.0
> come into the picture in the first place.
>
> > + */
> > +static double PQ_slowness= 3.0;
>
> Should not this be const?
> Why is the variable file scoped?  It is used in only one function.

const indeed.
file scoped for historical reasons ...


>
> > +
> > +#ifdef HAVE_EXPLICIT_TEMPLATE_INSTANTIATION
> > +template class Bounded_queue<uchar, uchar>;
> > +#endif
> > +
> >   /// How to write record_ref.
> >   #define WRITE_REF(file,from) \
> >   if (my_b_write((file),(uchar*) (from),param->ref_length)) \
>
> ...
>
>
> > @@ -80,18 +132,17 @@ static void unpack_addon_fields(struct s
> >     The result set is stored in table->io_cache or
> >     table->record_pointers.
> >
> > -  @param thd           Current thread
> > -  @param table        Table to sort
> > -  @param sortorder    How to sort the table
> > -  @param s_length    Number of elements in sortorder
> > -  @param select        condition to apply to the rows
> > -  @param max_rows    Return only this many rows
> > -  @param sort_positions    Set to 1 if we want to force sorting by 
> position
> > -            (Needed by UPDATE/INSERT or ALTER TABLE)
> > -  @param examined_rows    Store number of examined rows here
> > +  @param      thd            Current thread
> > +  @param      table          Table to sort
> > +  @param      sortorder      How to sort the table
> > +  @param      s_length       Number of elements in sortorder
> > +  @param      select         Condition to apply to the rows
> > +  @param      max_rows       Return only this many rows
> > +  @param      sort_positions Set to TRUE if we want to force 
> sorting by position
> > +                             (Needed by UPDATE/INSERT or ALTER TABLE)
> > +  @param[out] examined_rows  Store number of examined rows here
> > +  @param      options        0 or OPTION_FOUND_ROWS
>
> What is the point in having a generic options parameter when there is
> only one option.  Why not just use a bool for this?

Much easier to read at the calling site if it is a bitmask or enum,
if it is a bool, needs a comment.


>
> >
> > -  @todo
> > -    check why we do this (param.keys--)
> >     @note
> >       If we sort by position (like if sort_positions is 1) 
> filesort() will
> >       call table->prepare_for_position().
>
> ...
>
> > +  if (param.addon_field&&
> > +      !(table_sort.addon_buf=
> > +        (uchar *) my_malloc(param.addon_length, MYF(MY_WME))))
> >         goto err;
>
> Why has this not been moved to Sort_param::init() like the rest?

Because we are allocating and setting table_sort.addon_buf,
not initializing param.


>
> ...
>
> > -  }
> > -  else
> > -  {
> > -    param.res_length= param.ref_length;
> > -    /*
> > -      The reference to the record is considered
> > -      as an additional sorted field
> > -    */
> > -    param.sort_length+= param.ref_length;
> > -  }
> > -  param.rec_length= param.sort_length+param.addon_length;
> > -  param.max_rows= max_rows;
> >
> >     if (select&&  select->quick)
> > -  {
> >       status_var_increment(thd->status_var.filesort_range_count);
> > -  }
> >     else
> > -  {
> >       status_var_increment(thd->status_var.filesort_scan_count);
> > -  }
> > -#ifdef CAN_TRUST_RANGE
> > -  if (select&&  select->quick&& 
> select->quick->records>  0L)
> > -  {
> > -    records=min((ha_rows) (select->quick->records*2+EXTRA_RECORDS*2),
> > -        table->file->stats.records)+EXTRA_RECORDS;
> > -    selected_records_file=0;
> > -  }
> > -  else
> > -#endif
> > -  {
> > -    records= table->file->estimate_rows_upper_bound();
> > -    /*
> > -      If number of records is not known, use as much of sort buffer
> > -      as possible.
> > -    */
> > -    if (records == HA_POS_ERROR)
> > -      records--;  // we use 'records+1' below.
>
> No need for this anymore?

No, we are not using 'records+1' anymore.


>
> > -    selected_records_file= 0;
> > -  }
> > +
> > +  /*
> > +    If number of rows is not known, use as much of sort buffer as 
> possible.
> > +  */
> > +  num_rows= table->file->estimate_rows_upper_bound();
> >
> >     if (multi_byte_charset&&
> >         !(param.tmp_buffer= (char*) 
> my_malloc(param.sort_length,MYF(MY_WME))))
> >       goto err;
> >
> >     memavl= thd->variables.sortbuff_size;
> > -  min_sort_memory= max(MIN_SORT_MEMORY, param.sort_length*MERGEBUFF2);
> > -  while (memavl>= min_sort_memory)
> > -  {
> > -    ulong old_memavl;
> > -    ulong keys= memavl/(param.rec_length+sizeof(char*));
> > -    param.keys=(uint) min(records+1, keys);
> > -    if ((table_sort.sort_keys=
> > -     (uchar **) make_char_array((char **) table_sort.sort_keys,
> > -                                    param.keys, param.rec_length, 
> MYF(0))))
> > -      break;
> > -    old_memavl=memavl;
> > -    if ((memavl=memavl/4*3)<  min_sort_memory&&  old_memavl> 
> min_sort_memory)
> > -      memavl= min_sort_memory;
> > +
> > +  table_sort.sort_keys=
> > +    check_if_pq_applicable(&param,&table_sort, table, num_rows, 
> memavl);
> > +  if (table_sort.sort_keys)
> > +  {
> > +    DBUG_PRINT("info", ("filesort PQ is applicable"));
> > +    const size_t compare_length= param.sort_length;
> > +    if (pq.init(param.max_rows, 0, 1,
> > +                NULL,
> > +                compare_length,
> > +&make_sortkey,&param, table_sort.sort_keys))
> > +    {
> > +      // If failed to init pq, fall back to merge-sort.
> > +      my_free(table_sort.sort_keys);
> > +      table_sort.sort_keys= NULL;
>
> When may this happen?  If it is only when something is terribly wrong
> (e.g., out-of-memory), I think a fall-back strategy is just making
> things unecessary complex, and impossible to test.

Out-of-memory yes.
I did test it manually.


>
> > +    }
> >     }
> > -  sort_keys= table_sort.sort_keys;
> > -  if (memavl<  min_sort_memory)
> > +  else
> > +    DBUG_PRINT("info", ("filesort PQ is not applicable"));
>
> This will not catch the fall-back to merge-sort case.  Maybe it should
> be moved to the if-block below.

"filesort PQ is not applicable"
is not the same as
"filesort is appliccable, but we failed to allocate enough memory"


>
> > +
> > +  if (!table_sort.sort_keys)
>
> This could have been made into an else-statement if there was no
> fall-back. Would make things simpler.

Would you prefer "sort aborted" if we fail to allocate memory for the QUEUE?


>
> >     {
> > -    my_error(ER_OUT_OF_SORTMEMORY,MYF(ME_ERROR+ME_WAITTANG));
> > -    goto err;
> > +    const ulong min_sort_memory=
> > +      max(MIN_SORT_MEMORY, param.sort_length*MERGEBUFF2);
> > +    while (memavl>= min_sort_memory)
> > +    {
> > +      ulong old_memavl;
> > +      ulong keys= memavl/(param.rec_length+sizeof(char*));
> > +      param.num_keys_per_buffer= (uint) min(num_rows, keys);
> > +      if ((table_sort.sort_keys=
> > +           make_char_array(table_sort.sort_keys,
> > +                           param.num_keys_per_buffer, 
> param.rec_length)))
> > +        break;
> > +      old_memavl=memavl;
> > +      if ((memavl=memavl/4*3)<  min_sort_memory&&  old_memavl> 
> min_sort_memory)
> > +        memavl= min_sort_memory;
> > +    }
> > +    if (memavl<  min_sort_memory)
> > +    {
> > +      my_error(ER_OUT_OF_SORTMEMORY,MYF(ME_ERROR+ME_WAITTANG));
> > +      goto err;
> > +    }
> >     }
> > +
> > +  sort_keys= table_sort.sort_keys;
> >     if (open_cached_file(&buffpek_pointers,mysql_tmpdir,TEMP_PREFIX,
> >                  DISK_BUFFER_SIZE, MYF(MY_WME)))
> >       goto err;
> >
> > -  param.keys--;              /* TODO: check why we do this */
> >     param.sort_form= table;
> >     param.end=(param.local_sortorder=sortorder)+s_length;
> > -  if 
> ((records=find_all_keys(&param,select,sort_keys,&buffpek_pointers,
> > - &tempfile, selected_records_file)) ==
> > +  if ((num_rows= find_all_keys(&param, select, 
> sort_keys,&buffpek_pointers,
> > +&tempfile,
> > +                               pq.is_initialized() ?&pq : NULL,
>
> I think things would be a bit more readable if the above line was done
> first in a separate statement.

you mean a local variable to store the result of this expression ?:
  pq.is_initialized() ? &pq : NULL,


>
> > +&found_rows)) ==
> >         HA_POS_ERROR)
> >       goto err;
> >     maxbuffer= (uint) (my_b_tell(&buffpek_pointers)/sizeof(*buffpek));
> >
> >     if (maxbuffer == 0)            // The whole set is in memory
> >     {
> > -    if (save_index(&param,sort_keys,(uint) records,&table_sort))
> > +    if (save_index(&param, sort_keys, (uint) num_rows,&table_sort))
> >         goto err;
> >     }
> >     else
>
> ...
>
> > @@ -299,9 +333,17 @@ ha_rows filesort(THD *thd, TABLE *table,
> >               outfile))
> >         goto err;
> >     }
> > -  if (records>  param.max_rows)
> > -    records=param.max_rows;
> > -  error =0;
> > +
> > +  if (pq.is_initialized()&&  (options&  OPTION_FOUND_ROWS))
> > +  {
> > +    DBUG_PRINT("info", ("PQ and OPTION_FOUND_ROWS !!"));
> > +  }
>
> I would suggest a more descriptive text.  And why two "!" ?

Sorry, leftover dbug stuff (I guess I was excited when I was able to
make that feature work ....)


>
> > +
> > +  if (options&  OPTION_FOUND_ROWS)
> > +    num_rows= found_rows;
> > +  else if (num_rows>  param.max_rows)
> > +    num_rows= param.max_rows;
>
> I think a comment explaining the significance of the above would be
> good.

hmm, good question.
I'll get back to you on that one.


>
> > +  error= 0;
> >
> >    err:
> >     my_free(param.tmp_buffer);
>
> ...
>
> > @@ -366,23 +409,22 @@ void filesort_free_buffers(TABLE *table,
> >
> >   /** Make a array of string pointers. */
> >
> > -static char **make_char_array(char **old_pos, register uint fields,
> > -                              uint length, myf my_flag)
> > +static uchar **make_char_array(uchar **old_pos, uint fields, uint 
> length)
> >   {
> > -  register char **pos;
> > -  char *char_pos;
> >     DBUG_ENTER("make_char_array");
> >
> >     if (old_pos ||
> > -      (old_pos= (char**) my_malloc((uint) 
> fields*(length+sizeof(char*)),
> > -                   my_flag)))
> > +      (old_pos=
> > +       (uchar**) my_malloc(fields * (length + sizeof(char*)), 
> MYF(0))))
>
> Why sizeof(char*) here?  Would it be more consistent with uchar*?

OK


>
> >     {
> > -    pos=old_pos; char_pos=((char*) (pos+fields)) -length;
> > -    while (fields--) *(pos++) = (char_pos+= length);
> > +    uchar **pos= old_pos;
> > +    uchar *char_pos= ((uchar*) (pos+fields)) - length;
> > +    while (fields--)
> > +      *(pos++) = (char_pos+= length);
> >     }
> >
> >     DBUG_RETURN(old_pos);
> > -} /* make_char_array */
> > +}
> >
> >
> >   /** Read 'count' number of buffer pointers into memory. */
> > @@ -470,20 +512,26 @@ static void dbug_print_record(TABLE *tab
> >     @param buffpek_pointers  File to write BUFFPEKs describing 
> sorted segments
> >                              in tempfile.
> >     @param tempfile          File to write sorted sequences of 
> sortkeys to.
> > -  @param indexfile         If !NULL, use it for source data 
> (contains rowids)
> > +  @param pq                If !NULL, use it for keeping top N elements
> > +  @param [out] found_rows  The number of FOUND_ROWS().
> >
> >     @note
> >       Basic idea:
> >       @verbatim
> >        while (get_next_sortkey())
> >        {
> > -       if (no free space in sort_keys buffers)
> > +       if (using priority queue)
> > +         push sort key into queue
> > +       else
> >          {
> > -         sort sort_keys buffer;
> > -         dump sorted sequence to 'tempfile';
> > -         dump BUFFPEK describing sequence location into 
> 'buffpek_pointers';
> > +         if (no free space in sort_keys buffers)
> > +         {
> > +           sort sort_keys buffer;
> > +           dump sorted sequence to 'tempfile';
> > +           dump BUFFPEK describing sequence location into 
> 'buffpek_pointers';
> > +         }
> > +         put sort key into 'sort_keys';
> >          }
> > -       put sort key into 'sort_keys';
> >        }
>
> Do the first line of the function need updating?  I guess you are not
> always writing into a tempfile anymore?

The behaviour has not changed in that respect,
it would previously also write only if it ran out of space in the buffer.

Tried to clarify.


>
> >        if (sort_keys has some elements&&  dumped at least once)
> >          sort-dump-dump as above;
>
> ...
>
> > @@ -577,16 +628,6 @@ static ha_rows find_all_keys(SORTPARAM *
> >       }
> >       else                    /* Not quick-select */
> >       {
> > -      if (indexfile)
> > -      {
> > -    if (my_b_read(indexfile,(uchar*) ref_pos,ref_length)) /* 
> purecov: deadcode */
> > -    {
> > -      error= my_errno ? my_errno : -1;        /* Abort */
> > -      break;
> > -    }
> > -    error= file->ha_rnd_pos(sort_form->record[0], next_pos);
> > -      }
> > -      else
> >         {
>
> I appreciate that you have tried to reduced the amount of diff for the
> reviewer, but it would be good with a follow-up patch to reformat such
> things.
>

Sure.


>
> >       error= file->ha_rnd_next(sort_form->record[0]);
> >       if (!flag)
>
> ...
>
> > @@ -616,14 +657,23 @@ static ha_rows find_all_keys(SORTPARAM *
> >       if (!error&&  (!select ||
> >                      (!select->skip_record(thd,&skip_record)&& 
> !skip_record)))
> >       {
> > -      if (idx == param->keys)
> > +      ++(*found_rows);
> > +      if (pq)
> >         {
> > -    if (write_keys(param,sort_keys,idx,buffpek_pointers,tempfile))
> > -      DBUG_RETURN(HA_POS_ERROR);
> > -    idx=0;
> > -    indexpos++;
> > +        pq->push(ref_pos);
> > +        idx= pq->num_elements();
> > +      }
> > +      else
> > +      {
> > +        if (idx == param->num_keys_per_buffer)
>
> I think I prefer "else if" but I guess that is a matter of taste.

there is a call to make_sortkey() below, which would need an if (!pq) then.


>
> > +        {
> > +          if (write_keys(param,sort_keys, idx, buffpek_pointers, 
> tempfile))
> > +             DBUG_RETURN(HA_POS_ERROR);
> > +          idx= 0;
> > +          indexpos++;
> > +        }
> > +        make_sortkey(param, sort_keys[idx++], ref_pos);
> >         }
> > -      make_sortkey(param,sort_keys[idx++],ref_pos);
> >       }
> >       else
> >         file->unlock_row();
>
> ...
>
> > @@ -1036,17 +1088,17 @@ static void register_used_fields(SORTPAR
> >   }
> >
> >
> > -static bool save_index(SORTPARAM *param, uchar **sort_keys, uint 
> count,
> > +static bool save_index(Sort_param *param, uchar **sort_keys, uint 
> count,
> >                          FILESORT_INFO *table_sort)
> >   {
> >     uint offset,res_length;
> >     uchar *to;
> >     DBUG_ENTER("save_index");
> >
> > -  my_string_ptr_sort((uchar*) sort_keys, (uint) count, 
> param->sort_length);
> > +  my_string_ptr_sort((uchar*) sort_keys, count, param->sort_length);
> >     res_length= param->res_length;
> >     offset= param->rec_length-res_length;
> > -  if ((ha_rows) count>  param->max_rows)
> > +  if ((ha_rows) count>  param->max_rows&&  param->max_rows>
>  0)
>
> Should param->max_rows ever be <= 0? If not, I would prefer an assert
> here.

LIMIT 0 is perfectly legal, and max_rows is unsigned.


>
> >       count=(uint) param->max_rows;
> >     if (!(to= table_sort->record_pointers=
> >           (uchar*) my_malloc(res_length*count, MYF(MY_WME))))
> > @@ -1060,10 +1112,127 @@ static bool save_index(SORTPARAM *param,
> >   }
> >
> >
> > +/**
> > +  Test whether priority queue is worth using to get top elements of an
> > +  ordered result set. If it is then allocates buffer for required 
> amount of
>
> I suggest: "If it is, ..."

of course


>
> It is not clear to me, why it is a good idea to do the allocation of
> the buffer within this function.  It seems it leads to unnecessary
> duplication of calls to make_char_array().
>
>

It isn't necessarily allocated here, on second execution it is only
re-initialized, introduced here:
Fixed bug #28811: crash for a query containing a subquery with
ORDER BY and LIMIT 1.

We also take extra actions when "stripping off addon fields",
which depend on the successful allocation.

If we fail to allocate for PQ, we fall back to the loop in filesort:
     while (memavl >= min_sort_memory)
     {
       try to allocate
       if failed to allocate, decrease memavl.
     }



> > +  records
> > +
> > +  @param param     Sort parameters.
> > +  @param info      Filesort information.
> > +  @param table     Table to sort.
> > +  @param num_rows  Estimate of number of rows in source record set.
>
> If this is just an estimate, do you risk that the buffer will be too
> small?  What happens then?

This is used for cost estimation.
Several storage engines will return HA_POS_ERROR.


>
> > +  @param memavl    Memory available for sorting.
>
> I think memavl is a bit cryptic.

It has been called that for about 8 years, but sure...


>
> > +
> > +  DESCRIPTION
>
> I think you should consider rewriting the description below.  I do not
> feel it clearly describes the logic of the function.
>
> > +    Function tests applicability of PQ to get L top records for 
> queries like
>
> "L top records". L should be defined.  The example in the line below
> uses max_rows for L it seems.
>
> > +    SELECT ... FROM t ORDER BY a1,...,an LIMIT max_rows;
> > +    It tests whether there is enough memory to contain max_rows 
> elements,
> > +    whether max_rows is small enough to make PQ perform faster that 
> qsort
>
> Is there a missing "and" after comma, or is this the start of a list
> that was not completed?
>
> > +    on table containing 'num_rows' records. If available memory is 
> enough to
>
> "on table". Above "source record set" is used. I think same term
> should be used in both places.
>
> > +    store only 'max_rows' sort keys without addon data, it modifies 
> 'table'
>
> I would say, "is enough to store 'max_rows' sort keys, but not all
> addon data"

Tried to clarify.

>
> > +    to sort with references to tuples instead of additional data. 
> In evaluations
> > +    it uses 'param' to get sort key, addon data and reference 
> sizes, 'table' -
> > +    to get table scan time.
> > +
> > +   @retval
> > +    allocated buffer - if it's ok to use PQ
> > +    NULL - PQ will be slower than merge-sort, or there is not
> > enough memory.
>
> Is this return value according to the coding rules we discussed in
> optimizer team?  (return value !=0 means error)

Rewritten.

>
> > +*/
> > +
> > +uchar **check_if_pq_applicable(Sort_param *param, FILESORT_INFO *info,
> > +                               TABLE *table, ha_rows num_rows, 
> ulong memavl)
> > +{
> > +  DBUG_ENTER("check_if_pq_applicable");
> > +
> > +  if (param->max_rows == HA_POS_ERROR)
> > +  {
> > +    DBUG_PRINT("info", ("max_rows = HA_POS_ERROR"));
> > +    DBUG_RETURN(NULL);
>
> Can this happen?


Yes, if there is no LIMIT clause, param->max_rows == HA_POS_ERROR.

>
> > +  }
> > +
> > +  if (param->max_rows + 2>= UINT_MAX)
> > +  {
> > +    DBUG_PRINT("info", ("Too large LIMIT"));
>
> Please, use a more specific text here.

Why? this is within check_if_pq_applicable, so the trace will be

T@4    : | | | | | | | | | >check_if_pq_applicable
T@4    : | | | | | | | | | | info: Too large LIMIT
T@4    : | | | | | | | | | <check_if_pq_applicable


>
> > +    DBUG_RETURN(NULL);
> > +  }
> > +
> > +  uchar **sort_keys;
>
> In this context, it will not
>
> Seems to me that the code below would be simpler if if you initialized
> this to NULL, and just had a single return at the end.
>

this is now rewritten, please have another look.


> > +  ulong num_available_keys= memavl/(param->rec_length + 
> sizeof(char*));
> > +  param->num_keys_per_buffer= (uint) param->max_rows + 1;
> > +  if (num_rows<  num_available_keys)
> > +  {
> > +    /* The whole source set fits into memory. */
> > +    if (param->max_rows<  num_rows/PQ_slowness )
>
> Some comment on PQ_slowness would be good here.

There is already six lines describing PQ_slowness.
There's
     else
     {
       /* PQ will be slower */
       DBUG_RETURN(false);
     }

I don't know what more to say...


>
> > +    {
> > +      if ((sort_keys= make_char_array(info->sort_keys,
> > +                                      param->num_keys_per_buffer,
> > +                                      param->rec_length)))
> > +        DBUG_RETURN(sort_keys);
> > +    }
> > +    else
> > +    {
> > +      /* PQ will be slower */
> > +      DBUG_RETURN(NULL);
> > +    }
> > +  }
> > +
> > +  if (param->max_rows + 1<  num_available_keys&&
>
> Why +1?

rewritten to
   if (param->num_keys_per_buffer < num_available_keys)


>
> > +      (sort_keys= make_char_array(info->sort_keys,
> > +                                  param->num_keys_per_buffer,
> > +                                  param->rec_length)))
> > +    DBUG_RETURN(sort_keys);
> > +
> > +  /* Try to strip off addon fields */
> > +  if (param->addon_field)
> > +  {
> > +    const ulong row_lenght=
>
> Typo: row_lenght

OK


>
> > +      param->sort_length + param->ref_length + sizeof(char*);
>
> I would like a comment explaining what these 3 parts of the row
> represents.

See Sort_param and make_char_array (which also adds sizeof(pointer)).


>
> > +    num_available_keys= memavl / row_lenght;
> > +
> > +    if (param->max_rows + 1<  num_available_keys)
> > +    {
> > +      const double merge_cost=
> > +        get_merge_many_buffs_cost_fast(num_rows / num_available_keys,
> > +                                       num_available_keys,
> > +                                       num_rows % num_available_keys,
> > +                                       row_lenght);
>
> I think the division/remainder logic could be kept within the cost
> function.  Is there a need to burden the caller with that.

OK


>
> > +      const double pq_cost=
> > +        (PQ_slowness * num_rows + param->max_rows + 1)*
> > +        log(param->max_rows + 1.0) / TIME_FOR_COMPARE_ROWID +
> > +        param->max_rows * table->file->scan_time();
>
> I think this cost function needs an explanation.  For one thing, why
> is PQ_slowness placed where it is?

Added a comment.


>
> > +
> > +      if (merge_cost<  pq_cost)
> > +        DBUG_RETURN(NULL);
> > +
> > +      sort_keys= make_char_array(info->sort_keys,
> > +                                 param->num_keys_per_buffer,
> > +                                 param->rec_length);
> > +      if (sort_keys)
> > +      {
> > +        // Make attached data to be references instead of fields.
> > +        my_free(info->addon_buf);
> > +        my_free(info->addon_field);
> > +        info->addon_buf= NULL;
> > +        info->addon_field= 0;
>
> Seems a bit inconsistent to set one pointer to NULL and the other to
> 0.

indeed


>
> > +        param->addon_field= NULL;
> > +        param->addon_length= 0;
> > +
> > +        param->res_length= param->ref_length;
> > +        param->sort_length+= param->ref_length;
> > +        param->rec_length= param->sort_length + param->addon_length;
>
> Why do you add param->addon_length?  It has just been set to 0.


good catch.

>
> ...
>
> > === modified file 'sql/filesort.h'
> > --- a/sql/filesort.h    2010-07-02 02:58:51 +0000
> > +++ b/sql/filesort.h    2010-10-21 14:12:43 +0000
> > @@ -21,6 +21,7 @@ class SQL_SELECT;
> >   #include "my_global.h"                          /* uint, uchar */
> >   #include "my_base.h"                            /* ha_rows */
> >
> > +class JOIN;
>
> I do not think this is needed.  I see no reference to JOIN in this
> file.

leftover from explain_select() I guess.


>
> >   class SQL_SELECT;
> >   class THD;
> >   struct TABLE;
> > @@ -29,7 +30,7 @@ typedef struct st_sort_field SORT_FIELD;
> >   ha_rows filesort(THD *thd, TABLE *table, st_sort_field *sortorder,
> >                    uint s_length, SQL_SELECT *select,
> >                    ha_rows max_rows, bool sort_positions,
> > -                 ha_rows *examined_rows);
> > +                 ha_rows *examined_rows, ulonglong options);
> >   void filesort_free_buffers(TABLE *table, bool full);
> >   void change_double_for_sort(double nr,uchar *to);
> >
> >
>
> ...
>
> > === modified file 'sql/sql_select.cc'
>
> ...
>
> > @@ -3249,12 +3253,21 @@ JOIN::exec()
> >       the query. XXX: it's never shown in EXPLAIN!
> >       OPTION_FOUND_ROWS supersedes LIMIT and is taken into account.
> >         */
> > -      if (create_sort_index(thd, curr_join,
> > -                curr_join->group_list ?
> > -                curr_join->group_list : curr_join->order,
> > -                curr_join->select_limit,
> > -                (select_options&  OPTION_FOUND_ROWS ?
> > -                 HA_POS_ERROR : unit->select_limit_cnt),
> > +      DBUG_PRINT("info",("Sorting for order by/group by"));
> > +      ORDER *order_arg=
> > +        curr_join->group_list ? curr_join->group_list : 
> curr_join->order;
> > +      const ha_rows filesort_limit_arg=
> > +        (has_group_by || curr_join->tables>  1)
> > +        ? curr_join->select_limit : unit->select_limit_cnt;
>
> I think a comment with an explanation on filesort_limit_arg would be
> good.

done


>
> > +      const ha_rows select_limit_arg=
> > +        select_options&  OPTION_FOUND_ROWS
> > +        ? HA_POS_ERROR : unit->select_limit_cnt;
> > +
> > +      if (create_sort_index(thd,
> > +                            curr_join,
> > +                            order_arg,
> > +                            filesort_limit_arg,
> > +                            select_limit_arg,
> >                               curr_join->group_list ? FALSE : TRUE))
> >       DBUG_VOID_RETURN;
> >         sortorder= curr_join->sortorder;
>
> ...
>
> > @@ -18368,8 +18390,26 @@ end_send(JOIN *join, JOIN_TAB *join_tab
> >         error=join->result->send_data(*join->fields);
> >       if (error)
> >         DBUG_RETURN(NESTED_LOOP_ERROR); /* purecov: inspected */
> > -    if (++join->send_records>=
> join->unit->select_limit_cnt&&
> > -    join->do_send_rows)
> > +
> > +    ++join->send_records;
> > +    if (join->send_records>=
> join->unit->select_limit_cnt&&
> > +        !join->do_send_rows)
> > +    {
> > +      /*
> > +        If filesort is used for sorting, stop after select_limit_cnt+1
> > +        records are read. Because of optimization in some cases it can
> > +        provide only select_limit_cnt+1 records.
> > +      */
> > +      if (join->order&&
> > +          join->sortorder&&
> > +          join->select_options&  OPTION_FOUND_ROWS)
> > +      {
> > +        DBUG_PRINT("info", ("filesort NESTED_LOOP_QUERY_LIMIT"));
> > +        DBUG_RETURN(NESTED_LOOP_QUERY_LIMIT);
> > +      }
> > +    }
> > +    if (join->send_records>=
> join->unit->select_limit_cnt&&
> > +        join->do_send_rows)
>
> Seems like things would be a bit simpler with a pattern like this:
>
>   if (join->send_records>= join->unit->select_limit_cnt)
>     if (join->do_send_rows)
>       ...
>     else
>       ...

yes, this is a place where I wanted the diff to be as clean as possible.
Notice that there is an elseif 35 lines below...


>
> >       {
> >         if (join->select_options&  OPTION_FOUND_ROWS)
> >         {
>
> ...
>
> > @@ -20118,9 +20158,11 @@ create_sort_index(THD *thd, JOIN *join,
> >
> >     if (table->s->tmp_table)
> >       table->file->info(HA_STATUS_VARIABLE);    // Get record count
> > -  table->sort.found_records=filesort(thd, table,join->sortorder, 
> length,
> > -                                     select, filesort_limit, 0,
> > -&examined_rows);
> > +  table->sort.found_records= filesort(thd, table,join->sortorder, 
> length,
>
> Missing space after comma in line above.

OK


>
> > +                                      select, filesort_limit, 0,
> > +&examined_rows,
> > +                                      join->tables != 1 ? 0 :
> > +                                      join->select_options& 
> OPTION_FOUND_ROWS);
>
> Would things go wrong if OPTION_FOUND_ROWS were passed in also when
> join->tables != 1?

yes, we can do the FOUND_ROWS calculation on single table operations only.


>
> >     tab->records= table->sort.found_records;    // For SQL_CALC_ROWS
> >     if (select)
> >     {
>
> ...
>
> > === modified file 'sql/sql_sort.h'
> > --- a/sql/sql_sort.h    2010-07-02 18:15:21 +0000
> > +++ b/sql/sql_sort.h    2010-10-21 14:12:43 +0000
> > @@ -26,7 +26,7 @@ typedef struct st_sort_field SORT_FIELD;
> >
> >   class Field;
> >   struct TABLE;
> > -
> > +class THD;
>
> The only reason THD needed here, seems to be to get the value of
> thd->variables.max_length_for_sort_data in a function called
> indirectly by init.  Would it not be better to just provide that
> information explicitly?
>

yes, I did think about that, OK.


> >
> >   /* Defines used by filesort and uniques */
> >
> > @@ -71,15 +71,17 @@ struct BUFFPEK_COMPARE_CONTEXT
> >     void *key_compare_arg;
> >   };
> >
> > -typedef struct st_sort_param {
> > -  uint rec_length;          /* Length of sorted records */
> > -  uint sort_length;            /* Length of sorted columns */
> > -  uint ref_length;            /* Length of record ref. */
> > -  uint addon_length;        /* Length of added packed fields */
> > -  uint res_length;          /* Length of records in final sorted 
> file/buffer */
> > -  uint keys;                /* Max keys / buffer */
> > -  ha_rows max_rows,examined_rows;
> > -  TABLE *sort_form;            /* For quicker make_sortkey */
> > +class Sort_param {
> > +public:
> > +  uint rec_length;            /* Length of sorted records */
> > +  uint sort_length;           /* Length of sorted columns */
> > +  uint ref_length;            /* Length of record ref. */
> > +  uint addon_length;          /* Length of added packed fields */
> > +  uint res_length;            /* Length of records in final sorted 
> file/buffer */
> > +  uint num_keys_per_buffer;   /* Max keys / buffer */
>
> num keys or max keys? Name and description does not match.
>
> > +  ha_rows max_rows;           /* Select limit, or HA_POS_ERROR if 
> unlimited */
> > +  ha_rows examined_rows;      /* Number of examined rows */
> > +  TABLE *sort_form;           /* For quicker make_sortkey */
> >     SORT_FIELD *local_sortorder;
> >     SORT_FIELD *end;
> >     SORT_ADDON_FIELD *addon_field; /* Descriptors for companion 
> fields */
> > @@ -89,15 +91,22 @@ typedef struct st_sort_param {
> >     /* The fields below are used only by Unique class */
> >     qsort2_cmp compare;
> >     BUFFPEK_COMPARE_CONTEXT cmp_context;
> > -} SORTPARAM;
> > +
> > +  Sort_param()
> > +  {
> > +    memset(this, 0, sizeof(*this));
>
> This makes this header file not self-contained.  It does not import
> the definition of memset, AFAICT.

OK


>
> > +  }
> > +  void init(uint sortlen, TABLE *table, THD *thd,
> > +            ha_rows maxrows, bool sort_positions);
> > +};
> >
> >
> > -int merge_many_buff(SORTPARAM *param, uchar *sort_buffer,
> > +int merge_many_buff(Sort_param *param, uchar *sort_buffer,
> >               BUFFPEK *buffpek,
> >               uint *maxbuffer, IO_CACHE *t_file);
> >   uint read_to_buffer(IO_CACHE *fromfile,BUFFPEK *buffpek,
> >               uint sort_length);
> > -int merge_buffers(SORTPARAM *param,IO_CACHE *from_file,
> > +int merge_buffers(Sort_param *param,IO_CACHE *from_file,
> >             IO_CACHE *to_file, uchar *sort_buffer,
> >             BUFFPEK *lastbuff,BUFFPEK *Fb,
> >             BUFFPEK *Tb,int flag);
>
> You have fixed similar errors in indentation in other places.  Why not
> here?
>

Didn't notice the tabs, everything was perfecly aligned in my editor.


> ...
>
> > === modified file 'sql/uniques.cc'
> > --- a/sql/uniques.cc    2010-07-08 21:42:23 +0000
> > +++ b/sql/uniques.cc    2010-10-21 14:12:43 +0000
> > @@ -577,7 +577,6 @@ bool Unique::walk(tree_walk_action actio
> >
> >   bool Unique::get(TABLE *table)
> >   {
> > -  SORTPARAM sort_param;
> >     table->sort.found_records=elements+tree.elements_in_tree;
> >
> >     if (my_b_tell(&file) == 0)
> > @@ -612,19 +611,20 @@ bool Unique::get(TABLE *table)
> >       return 1;
> >     reinit_io_cache(outfile,WRITE_CACHE,0L,0,0);
> >
> > -  bzero((char*)&sort_param,sizeof(sort_param));
> > +  Sort_param sort_param;
> >     sort_param.max_rows= elements;
> >     sort_param.sort_form=table;
> >     sort_param.rec_length= sort_param.sort_length= 
> sort_param.ref_length=
> >       size;
> > -  sort_param.keys= (uint) (max_in_memory_size / 
> sort_param.sort_length);
> > +  sort_param.num_keys_per_buffer=
> > +    (uint) (max_in_memory_size / sort_param.sort_length);
> >     sort_param.not_killable=1;
> >
> > -  if (!(sort_buffer=(uchar*) my_malloc((sort_param.keys+1) *
> > +  if (!(sort_buffer=(uchar*) 
> my_malloc((sort_param.num_keys_per_buffer + 1) *
> >                          sort_param.sort_length,
> >                          MYF(0))))
> >       return 1;
> > -  sort_param.unique_buff= sort_buffer+(sort_param.keys*
> > +  sort_param.unique_buff= 
> sort_buffer+(sort_param.num_keys_per_buffer *
> >                          sort_param.sort_length);
> >
> >     sort_param.compare= (qsort2_cmp) buffpek_compare;
>
> It seems a bit inconsistent to introduce an init-function for
> Sort_param and only use it in 1 of 2 places where sort_param is used.
>

It was actually introduced to simplify explain_filesort(),
which was rejected by architect review.


> ...
>
> > === added file 'unittest/gunit/bounded_queue-t.cc'
>
> > +/*
> > +  Comparison function for Test_key objects.
> > + */
> > +extern "C"
>
> Declaring it as extern "C" gives a warning (Anachronism) with SUN
> compiler.  I guess it is because Bounded_queue does not declare the
> compare function as extern "C".

oops, leftovers before the templatization.


>
> > +int test_key_compare(size_t *cmp_arg, Test_key **a, Test_key **b)
> > +{
> > +  EXPECT_EQ(*cmp_arg, sizeof(int));
> > +
> > +  int a_num= (*a)->key;
> > +  int b_num= (*b)->key;
> > +
> > +  if (a_num>  b_num)
> > +    return +1;
> > +  if (a_num<  b_num)
> > +    return -1;
> > +  return 0;
> > +}
> > +
> > +
> > +/*
> > +  Generates a Test_key for a given Test_element.
> > + */
> > +void test_keymaker(Sort_param *sp, Test_key *key, Test_element 
> *element)
>
> I guess making a unit test that actually uses Sort_param would be a
> bit awkward?

I wanted unit tests which were easily debuggable,
the uchar** stuff was not so easy.


>
> > +{
> > +  int key_val= element->val;
>
> Why do you need this local variable?

again: leftovers from before templatization
(lots of casting involved, so needed for debugging)


> > +&test_keymaker, NULL, m_keys.key_ptrs));
> > +  for (int ix= 0; ix<  array_size(m_test_data); ++ix)
> > +  {
> > +    m_queue.push(&m_test_data[ix]);
> > +  }
> > +  /*
> > +   The queue should now contain [0 .. (num_elements/2 - 1)]
> > +   plus one extra element that we are not interested in.
>
> I do not think this "not interesting" aspect of Bounded_queue is
> documented.  Seems a bit awkward.

you are right, that should be handled by Bounded_queue itself, rather
than the client code.


>
> > +  */
> > +  (void) m_queue.pop();
> > +  int expected_key_val= m_queue.num_elements() - 1;
> > +  while (m_queue.num_elements()>  0)
> > +  {
> > +    Test_key **top= m_queue.pop();
> > +    int key_val= (*top)->key;
> > +    EXPECT_EQ(expected_key_val, key_val);
> > +    Test_element *element= (*top)->element;
> > +    EXPECT_EQ(expected_key_val, element->val);
> > +    --expected_key_val;
> > +  }
> > +}
> > +
> > +
> > +/*
> > +  Verifies that push, with bounded size, followed by sort() works.
> > + */
> > +TEST_F(BoundedQueueTest, insert_and_sort)
> > +{
> > +  EXPECT_EQ(0, m_queue.init(num_elements/2, 0, TRUE, test_key_compare,
> > +                            m_key_size,
> > +&test_keymaker, NULL, m_keys.key_ptrs));
> > +  for (int ix= 0; ix<  array_size(m_test_data); ++ix)
> > +  {
> > +    m_queue.push(&m_test_data[ix]);
> > +  }
> > +
> > +  uchar *base=  (uchar*)&m_keys.key_ptrs[0];
> > +  uint   items= m_queue.num_elements();
> > +  size_t size=  sizeof(Test_key);
> > +  // We sort our keys as strings, so erase all the element pointers 
> first.
> > +  for (int ii= 0; ii<  array_size(m_keys.key_data); ++ii)
> > +    m_keys.key_data[ii].element= NULL;
> > +
> > +  my_string_ptr_sort(base, items, size);
> > +  for (int ii= 0; ii<  num_elements/2; ++ii)
> > +  {
> > +    Test_key *sorted_key= m_keys.key_ptrs[ii];
> > +    EXPECT_EQ(ii, sorted_key->key);
> > +  }
> > +}
> > +
> > +
> > +/*
> > +  A test of the function get_merge_many_buffs_cost_fast()
> > + */
> > +TEST(CostEstimationTest, merge_many_buff)
> > +{
> > +  ha_rows num_rows= 512;
> > +  ulong num_keys= 100;
> > +  ulong row_lenght= 100;
> > +  double prev_cost= 0.0;
> > +  while (num_rows<= MAX_FILE_SIZE/4)
> > +  {
> > +    double merge_cost=
> > +      get_merge_many_buffs_cost_fast(num_rows / num_keys,
> > +                                     num_keys,
> > +                                     num_rows % num_keys,
> > +                                     row_lenght);
> > +    EXPECT_LT(0.0, merge_cost);
> > +    EXPECT_LT(prev_cost, merge_cost);
> > +    num_rows*= 2;
> > +    prev_cost= merge_cost;
> > +  }
> > +}
> > +
>
> It seems that this just tests that the cost is positive and
> monotonically increasing.  Would it be possible in some other way to
> test that the computed cost is actually reasonable?

This started out as an over/undeflow test (got NaN for certain input).
Then as a test that it was actually  "fast" (the inner loop was very slow).
I really don't know "reasonable cost" is,
and it's hard to test float accuracy anyways ...



>
> > +
> > +/*
> > +  Comparison function for integers.
> > + */
> > +extern "C"
>
> Also gives warning.

gone


>
> > +int int_ptr_compare(size_t *cmp_arg, int **a, int **b)
> > +{
> > +  EXPECT_EQ(*cmp_arg, sizeof(int));
> > +
> > +  int a_num= **a;
> > +  int b_num= **b;
> > +
> > +  if (a_num>  b_num)
> > +    return +1;
> > +  if (a_num<  b_num)
> > +    return -1;
> > +  return 0;
> > +}
> > +
> > +
> > +/*
> > +  Generates an integer key for a given integer element.
> > + */
> > +void int_keymaker(Sort_param *sp, int *to, int *from)
> > +{
> > +  memcpy(to, from, sizeof(int));
> > +}
> > +
> > +
> > +/*
> > +  Some basic performance testing, to compute the overhead of 
> Bounded_queue.
> > +  Run the with 'bounded_queue-t --disable-tap-output' to see the
> > +  millisecond output from Google Test.
> > + */
>
> I wonder whether it would make sense to have a separate file for this
> performance test.
>

I don't see any point in having another executable for that?


> > +const int num_rows= 10000;
> > +const int row_limit= 100;
> > +const int num_iterations= 1000;
> > +
> > +class PerfTestSmall : public ::testing::Test
> > +{
> > +public:
> > +  /*
> > +    The extra overhead of malloc/free should be part of the 
> measurement,
> > +    so we do not define the key container as a member here.
> > +  */
> > +  typedef Key_container<row_limit, int>  Container;
> > +  enum { limit= row_limit };
> > +};
> > +
> > +class PerfTestLarge : public ::testing::Test
> > +{
> > +public:
> > +  /*
> > +    The extra overhead of malloc/free should be part of the 
> measurement,
> > +    so we do not define the key container as a member here.
> > +  */
> > +  typedef Key_container<num_rows, int>  Container;
> > +  enum { limit= num_rows };
> > +};
> > +
> > +
> > +/*
> > +  Test with Bounded_queue size ==<limit>.
> > + */
> > +TEST_F(PerfTestSmall, insert_and_sort)
> > +{
> > +  for (int it= 0; it<  num_iterations; ++it)
> > +  {
> > +    Container *keys= new Container;
> > +    srand(0);
> > +    Bounded_queue<int, int>  queue;
> > +    EXPECT_EQ(0, queue.init(limit, 0, TRUE, int_ptr_compare,
> > +                            sizeof(int),&int_keymaker, NULL, 
> keys->key_ptrs));
> > +    for (int ix= 0; ix<  limit; ++ix)
> > +    {
> > +      int data= rand();
> > +      queue.push(&data);
> > +    }
> > +    my_string_ptr_sort((uchar*)&keys->key_ptrs[0],
> > +                       queue.num_elements(), sizeof(int));
> > +    delete keys;
> > +  }
> > +}
> > +
> > +
> > +/*
> > +  Test with Bounded_queue size ==<number of rows>
> > + */
> > +TEST_F(PerfTestLarge, insert_and_sort)
> > +{
> > +  for (int it= 0; it<  num_iterations; ++it)
> > +  {
> > +    Container *keys= new Container;
> > +    srand(0);
> > +    Bounded_queue<int, int>  queue;
> > +    EXPECT_EQ(0, queue.init(limit, 0, TRUE, int_ptr_compare,
> > +                            sizeof(int),&int_keymaker, NULL, 
> keys->key_ptrs));
> > +    for (int ix= 0; ix<  limit; ++ix)
> > +    {
> > +      int data= rand();
> > +      queue.push(&data);
> > +    }
> > +    my_string_ptr_sort((uchar*)&keys->key_ptrs[0],
> > +                       queue.num_elements(), sizeof(int));
> > +    delete keys;
> > +  }
> > +}
> > +
>
> You have two identical code parts above.  Maybe this could be made
> into a separate function?

It would need to be a templatized function.

done

>
> > +
> > +/*
> > +  Test without bounded queue, i.e. insert keys into array, and sort 
> it.
> > + */
> > +TEST_F(PerfTestLarge, without_queue)
> > +{
> > +  for (int it= 0; it<  num_iterations; ++it)
> > +  {
> > +    Container *keys= new Container;
> > +    srand(0);
> > +    for (int ix= 0; ix<  limit; ++ix)
> > +    {
> > +      int data= rand();
> > +      keys->key_data[ix]= data;
> > +    }
> > +    my_string_ptr_sort((uchar*)&keys->key_ptrs[0], limit, 
> sizeof(int));
> > +    delete keys;
> > +  }
> > +}
> > +
> > +
> > +/*
> > +  Computes the overhead of setting up sort arrays, and rand() calls.
> > + */
> > +TEST_F(PerfTestLarge, no_sorting)
> > +{
> > +  for (int it= 0; it<  num_iterations; ++it)
> > +  {
> > +    Container *keys= new Container;
> > +    srand(0);
> > +    for (int ix= 0; ix<  limit; ++ix)
> > +    {
> > +      int data= rand();
> > +      keys->key_data[ix]= data;
> > +    }
> > +    delete keys;
> > +  }
> > +}
> > +
> > +}  // namespace
> >
> >
> >
> >
> >
>

Thread
bzr commit into mysql-next-mr-bugfixing branch (tor.didriksen:3236) WL#1393Tor Didriksen11 Nov
  • Re: bzr commit into mysql-next-mr-bugfixing branch (tor.didriksen:3236)WL#1393Øystein Grøvlen19 Nov
    • Re: bzr commit into mysql-next-mr-bugfixing branch (tor.didriksen:3236)WL#1393Tor Didriksen24 Nov
      • Re: bzr commit into mysql-next-mr-bugfixing branch (tor.didriksen:3236)WL#1393Jorgen Loland25 Nov
      • Re: bzr commit into mysql-next-mr-bugfixing branch (tor.didriksen:3236)WL#1393Øystein Grøvlen26 Nov
        • Re: bzr commit into mysql-next-mr-bugfixing branch (tor.didriksen:3236)WL#1393Tor Didriksen2 Dec
          • Re: bzr commit into mysql-next-mr-bugfixing branch (tor.didriksen:3236)WL#1393Øystein Grøvlen2 Dec
            • Re: bzr commit into mysql-next-mr-bugfixing branch (tor.didriksen:3236)WL#1393Øystein Grøvlen3 Dec