List:Commits« Previous MessageNext Message »
From:Tor Didriksen Date:December 6 2010 9:47am
Subject:Re: bzr commit into mysql-trunk-bugfixing branch (tor.didriksen:3250)
WL#1393
View as plain text  
On 2010-12-06 09:27, Øystein Grøvlen wrote:
> Hi Tor,
>
> I will approve this worklog now.  See in-line for some minor
> suggestions that I hope you will consider.
>
> I am still not sure we have fully understood the cost aspects, but I
> seems we can be confident that it does much more good than harm, and
> that the risk of performance regressions are small.
>
> -- 
> Øystein
>
> On 11/29/10 12:21 PM, Tor Didriksen wrote:
> > #At file:///export/home/didrik/repo/next-mr-opt-team-wl1393-merge/ 
> based on revid:tor.didriksen@stripped
> >
> >   3250 Tor Didriksen    2010-11-29
> >        WL#1393 Optimizing filesort with small limit
> >
> >        Many web customers have to do
> >        "SELECT ... ORDER BY non_index_column LIMIT X",
> >
> >        When X *<Row Size>  is smaller than sort_buff_size we can use
> >        the following algoritm to speed up the sort:
> >
> >        - Create a queue to hold 'limit' keys.
> >        - Scan through the table and store the first (last if DESC) 
> keys in the queue
> >        - Return values from queue
> >       @ libmysqld/CMakeLists.txt
> >          Add Bounded_queue
> >       @ libmysqld/Makefile.am.THIS
> >          Add Bounded_queue
>
> ???
some bzr hiccup?

>
> >       @ mysql-test/include/order_by.inc
> >          Add new tests.
> >       @ mysql-test/include/select.inc
> >          Fix typo in bug number.
> >       @ mysql-test/r/order_by_none.result
> >          Added new tests.
> >       @ mysql-test/r/order_by_sortkey.result
> >          New test.
> >       @ mysql-test/r/select_none.result
> >          Fix typo in bug number.
> >       @ mysql-test/t/order_by_sortkey.test
> >          New test.
> >       @ mysys/CMakeLists.txt
> >          Compile standalone queues test executable.
> >       @ mysys/queues.c
> >          Fix un-maintained code: s/bool/my_bool/
> >       @ sql/CMakeLists.txt
> >          Add Bounded_queue to gunit tests.
> >       @ sql/Makefile.am.THIS
> >          Add Bounded_queue
>
> ???
I'll be patching the diff into a different tree anyways, so this will go 
away.

>
> ...
>
> > +  /**
> > +     Function for comparing two keys.
> > +     @param  n Pointer to number of bytes to compare.
> > +     @param  a First key.
> > +     @param  b Second key.
> > +     @retval -1, 0, or 1 depending on whether the left argument is
> > +             less than, equal to, or greater than the right argument.
> > +   */
> > +  typedef int (*compare_function)(size_t *n, Key_type **a, Key_type 
> **b);
> > +
> > +  /**
> > +    Initialize the queue.
> > +
> > +    @param max_elements   The size of the queue.
> > +    @param offset_to_key  Offset to key in elements stored in the 
> queue.
>
> Didn't we agree to drop offset_to_key?

We didn't, but we do now.

>
> > +    @param max_at_top     Set to 1 if you want biggest element on top.
> > +    @param compare        Compare function for elements, takes 3 
> arguments.
> > +                          If NULL, we use 
> get_ptr_compare(compare_length).
> > +    @param compare_length Length of the data (i.e. the keys) used 
> for sorting.
> > +    @param keymaker       Function which generates keys for elements.
> > +    @param sort_param     Sort parameters.
> > +    @param sort_keys      Array of pointers to keys to sort.
> > +
> > +    @retval 0 OK, 1 Could not allocate memory.
> > +
> > +    We do *not* take ownership of any of the input pointer arguments.
> > +   */
> > +  int init(ha_rows max_elements, uint offset_to_key, bool max_at_top,
> > +           compare_function compare, size_t compare_length,
> > +           keymaker_function keymaker, Sort_param *sort_param,
> > +           Key_type **sort_keys);
> > +
> > +  /**
> > +    Pushes an element on the queue.
> > +    If the queue is already full, we discard one element.
> > +    Calls keymaker_function to generate a key for the element.
> > +
> > +    @param element        The element to be pushed.
> > +   */
> > +  void push(Element_type *element);
> > +
> > +  /**
> > +    Removes the top element from the queue.
> > +
> > +    @retval Pointer to the (key of the) removed element.
> > +   */
> > +  Key_type **pop()
> > +  {
> > +    // Don't return the extra element to the client code.
> > +    if (queue_is_full((&m_queue)))
> > +      queue_remove(&m_queue, 0);
> > +    return reinterpret_cast<Key_type**>(queue_remove(&m_queue, 0));
> > +  }
>
> I think we agreed to add a comment that said that pop() had been
> implemented for testing purposes, and is not fully tested.
>

done.

> ...
>
> > +      /*
> > +        PQ has cost:
> > +        (insert + qsort) * log(queue size) / TIME_FOR_COMPARE_ROWID +
> > +        cost of file lookup afterwards.
> > +      */
> > +      const double pq_cost=
> > +        (PQ_slowness * num_rows + param->num_keys_per_buffer) *
> > +        log(param->max_rows + 1.0) / TIME_FOR_COMPARE_ROWID +
> > +        param->max_rows * table->file->scan_time();
>
> As discussed on Friday, param->num_keys_per_buffer == param->max_rows 
> + 1.0
> at this point.  I think it would be more obvious if just one of these
> variables are used in this formula.
>

OK

> ...
>
> > +
> > +/*
> > +  Some basic performance testing, to compute the overhead of 
> Bounded_queue.
> > +  Run the with 'bounded_queue-t --disable-tap-output' to see the
> > +  millisecond output from Google Test.
> > + */
> > +const int num_rows= 10000;
> > +const int row_limit= 100;
> > +const int num_iterations= 1000;
> > +
> > +class PerfTestSmall : public ::testing::Test
> > +{
> > +public:
> > +  /*
> > +    The extra overhead of malloc/free should be part of the 
> measurement,
> > +    so we do not define the key container as a member here.
> > +  */
> > +  typedef Key_container<row_limit, int>  Container;
> > +  enum { limit= row_limit };
> > +};
> > +
> > +class PerfTestLarge : public ::testing::Test
> > +{
> > +public:
> > +  /*
> > +    The extra overhead of malloc/free should be part of the 
> measurement,
> > +    so we do not define the key container as a member here.
> > +  */
> > +  typedef Key_container<num_rows, int>  Container;
> > +  enum { limit= num_rows };
> > +};
> > +
> > +
> > +template<int limit>
> > +void insert_and_sort()
> > +{
> > +  typedef Key_container<limit, int>  Container;
> > +  for (int it= 0; it<  num_iterations; ++it)
> > +  {
> > +    Container *keys= new Container;
> > +    srand(0);
> > +    Bounded_queue<int, int>  queue;
> > +    EXPECT_EQ(0, queue.init(limit, 0, true, int_ptr_compare,
> > +                            sizeof(int),&int_keymaker, NULL, 
> keys->key_ptrs));
> > +    for (int ix= 0; ix<  limit; ++ix)
>
> Should not this loop do num_rows iterations?
> Otherwise, the name of the constants are a bit misleading.
> The way it is now, it should rather be num_rows_large instead of
> num_rows and num_rows_small instead of row_limit.

ah yes, thanks.

>
> > +    {
> > +      int data= rand();
> > +      queue.push(&data);
> > +    }
> > +    my_string_ptr_sort((uchar*)&keys->key_ptrs[0],
> > +                       queue.num_elements(), sizeof(int));
> > +    delete keys;
> > +  }
> > +}

Thread
bzr commit into mysql-trunk-bugfixing branch (tor.didriksen:3250) WL#1393Tor Didriksen29 Nov
  • Re: bzr commit into mysql-trunk-bugfixing branch (tor.didriksen:3250)WL#1393Øystein Grøvlen6 Dec
    • Re: bzr commit into mysql-trunk-bugfixing branch (tor.didriksen:3250)WL#1393Tor Didriksen6 Dec
  • Re: bzr commit into mysql-trunk-bugfixing branch (tor.didriksen:3250)WL#1393Jorgen Loland15 Dec