MySQL Lists are EOL. Please join:

List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:December 1 2010 8:49pm
Subject:Re: #57932 [Com]: query with AVG(DISTINCT) returns NULL if last aggregated
value was NULL
View as plain text  
Hello,

Georgi Kodinov a écrit, Le 01.12.2010 14:40:
> On 01.12.2010, at 15:36, Guilhem Bichot wrote:
>> Bug Database a écrit, Le 01.12.2010 13:38:
>>> Guilhem,
>>> Why not add this as members of the aggregator ? e.g.
> Aggregator_distinct::arg_val_decimal(my_decimal * value)
>>> {
>>>  if (use_distinct_values)
>>>    return table->field[0]->val_decimal (value);
>>>  else
>>>   return item_sum->args[0]->val_decimal(value);
>>> }
>>> Aggregator_distinct::arg_val_real()
>>> {
>>>  if (use_distinct_values)
>>>    return table->field[0]->val_real ();
>>>  else
>>>   return item_sum->args[0]->val_real();
>>> }
>>> Aggregator_distinct::arg_is_null()
>>> {
>>>  if (use_distinct_values)
>>>    return  table->field[0]->is_null()    else
>>>   return item_sum->args[0]->null_value;
>>> }
>>> And add the corresponding members in Aggregator and
>>> Aggregator_simple(). Maybe we can even make use_distinct_values private ?
>>> In this way we can then remove some if()s from Item_sum_sum
>> It is true that Item_sum_sum::add() would be shortened.
>>
>> If I understand well, we would have this code in
>> Item_sum_avg::add()
>> {
>>  Item_sum_sum::add();
>>  if (!aggr->arg_is_null()) count++;
>> }
>> And Item_sum_sum::add() would also have calls to aggr->arg_is_null():
>>  if (!aggr->arg_is_null()) include it in the sum;
>>
>> But I find it a bit inefficient to have repeated logic: one test of NULL to
> decide whether the added number should affect the sum, and one test of NULL to decide
> whether the added number should affect the count. Whereas in the first patch which I
> proposed (where "count++" is moved to Item_sum_sum::add()), we get both operations in one
> test (the patch doesn't add any if(), it squeezes count++ inside an existing if()).
>>
>> We could imagine a mix of the two ideas: "count" moves to Item_sum_sum and is
> maintained in Item_sum_sum::add() (idea of my first patch, no new if()), and we introduce
> Aggregator_distinct::arg_val_decimal/etc to simplify Item_sum_sum::add(). The best of both
> worlds.
>>
>> What do you think of this?
> 
> I don't like when we add functionality that's needed by Item_sum_avg to
> Item_sum_sum.

In theory you are very right. In practice, this added functionality is 
essentially this change to Item_sum_sum::add():
      if (!arg_is_null)
+    {
        null_value= 0;
+      count++;
+    }

which looks very small to me. That is, Item_sum_sum::add() already does 
the job of finding out the NULLness, it already has all if()s, I use 
this opportunity to calculate the count at the right moment where we 
have all the data.

 > Remember, Item_sum_sum is a class that can be used standalone.

Yes.

 > I don't think it's wise to add functionality to it that it doesn't
 > need.
> Specially when this functionality stands on a critical path. 

It's just a "count++", no if() added.
If we talk about critical path and thus performance, let's contrast this 
with the idea of having
   Item_sum_avg::add()
   {
     Item_sum_sum::add();
     if (!aggr->arg_is_null()) count++;
  }
It adds:
- an if()
- a pointer dereference
- a function call (aggr->arg_is_null()) (worse, a virtual one)
- which itself does:
   if (use_distinct_values)
     return  table->field[0]->is_null()    else
   return item_sum->args[0]->null_value;
i.e. an if(), pointer dereferences, and possibly a call to is_null() 
which looks like
   inline bool is_null(my_ptrdiff_t row_offset= 0)
   { return null_ptr ? (null_ptr[row_offset] & null_bit ? 1 : 0) : 
table->null_row; }
which has an if(), an array subscription, an arithmetic operation, yes 
another if() depending on the result of this operation...
All that is surely way more than an added count++.

Basically, I believe that Item_sum_sum already takes all steps we need 
in Item_sum_avg, so it's smarter to let him update a count. I don't see 
this as a hack. It's true that Item_sum_sum shouldn't care about 
"count", but is best-placed to maintain it cheaply and still cleanly.

> Besides : what's wrong with checking for NULL ? it's a normal action done everywhere
 > else in the server.

Checking for NULL is not wrong per se. I was saying that having:

Item_sum_sum::add()
{
    if (aggr->arg_is_null()) increase sum;
}
Item_sum_avg::add()
{
   Item_sum_sum::add();
   if (aggr->arg_is_null() increase count;
}

wasn't that efficient compared to

Item_sum_sum::add()
{
    if (aggr->arg_is_null()) increase sum and count;
}
and having no Item_sum_avg::add().
If you look at the first patch which I proposed, by moving "count" I 
eliminate Item_sum_avg::add() and Item_sum_avg::clear(), which I take as 
a sign of "improved design".

That's why I proposed to retain your idea of using virtual functions to 
Item_sum_sum::add(), and at the same time put "count" in Item_sum_sum.

If you believe that having "count" in Item_sum_sum is not possible, and 
neither of the two other patches I proposed is suitable, I propose to 
ask for the opinion of a second reviewer (I'll find one in my team soon).
Thread
Re: #57932 [Com]: query with AVG(DISTINCT) returns NULL if last aggregatedvalue was NULLGuilhem Bichot1 Dec
Re: #57932 [Com]: query with AVG(DISTINCT) returns NULL if last aggregatedvalue was NULLGuilhem Bichot1 Dec