List: | Commits | « Previous MessageNext Message » | |

From: | Guilhem Bichot | Date: | December 1 2010 8:49pm |

Subject: | Re: #57932 [Com]: query with AVG(DISTINCT) returns NULL if last aggregated value was NULL | ||

View as plain text |

Hello, Georgi Kodinov a écrit, Le 01.12.2010 14:40: > On 01.12.2010, at 15:36, Guilhem Bichot wrote: >> Bug Database a écrit, Le 01.12.2010 13:38: >>> Guilhem, >>> Why not add this as members of the aggregator ? e.g. > Aggregator_distinct::arg_val_decimal(my_decimal * value) >>> { >>> if (use_distinct_values) >>> return table->field[0]->val_decimal (value); >>> else >>> return item_sum->args[0]->val_decimal(value); >>> } >>> Aggregator_distinct::arg_val_real() >>> { >>> if (use_distinct_values) >>> return table->field[0]->val_real (); >>> else >>> return item_sum->args[0]->val_real(); >>> } >>> Aggregator_distinct::arg_is_null() >>> { >>> if (use_distinct_values) >>> return table->field[0]->is_null() else >>> return item_sum->args[0]->null_value; >>> } >>> And add the corresponding members in Aggregator and >>> Aggregator_simple(). Maybe we can even make use_distinct_values private ? >>> In this way we can then remove some if()s from Item_sum_sum >> It is true that Item_sum_sum::add() would be shortened. >> >> If I understand well, we would have this code in >> Item_sum_avg::add() >> { >> Item_sum_sum::add(); >> if (!aggr->arg_is_null()) count++; >> } >> And Item_sum_sum::add() would also have calls to aggr->arg_is_null(): >> if (!aggr->arg_is_null()) include it in the sum; >> >> But I find it a bit inefficient to have repeated logic: one test of NULL to > decide whether the added number should affect the sum, and one test of NULL to decide > whether the added number should affect the count. Whereas in the first patch which I > proposed (where "count++" is moved to Item_sum_sum::add()), we get both operations in one > test (the patch doesn't add any if(), it squeezes count++ inside an existing if()). >> >> We could imagine a mix of the two ideas: "count" moves to Item_sum_sum and is > maintained in Item_sum_sum::add() (idea of my first patch, no new if()), and we introduce > Aggregator_distinct::arg_val_decimal/etc to simplify Item_sum_sum::add(). The best of both > worlds. >> >> What do you think of this? > > I don't like when we add functionality that's needed by Item_sum_avg to > Item_sum_sum. In theory you are very right. In practice, this added functionality is essentially this change to Item_sum_sum::add(): if (!arg_is_null) + { null_value= 0; + count++; + } which looks very small to me. That is, Item_sum_sum::add() already does the job of finding out the NULLness, it already has all if()s, I use this opportunity to calculate the count at the right moment where we have all the data. > Remember, Item_sum_sum is a class that can be used standalone. Yes. > I don't think it's wise to add functionality to it that it doesn't > need. > Specially when this functionality stands on a critical path. It's just a "count++", no if() added. If we talk about critical path and thus performance, let's contrast this with the idea of having Item_sum_avg::add() { Item_sum_sum::add(); if (!aggr->arg_is_null()) count++; } It adds: - an if() - a pointer dereference - a function call (aggr->arg_is_null()) (worse, a virtual one) - which itself does: if (use_distinct_values) return table->field[0]->is_null() else return item_sum->args[0]->null_value; i.e. an if(), pointer dereferences, and possibly a call to is_null() which looks like inline bool is_null(my_ptrdiff_t row_offset= 0) { return null_ptr ? (null_ptr[row_offset] & null_bit ? 1 : 0) : table->null_row; } which has an if(), an array subscription, an arithmetic operation, yes another if() depending on the result of this operation... All that is surely way more than an added count++. Basically, I believe that Item_sum_sum already takes all steps we need in Item_sum_avg, so it's smarter to let him update a count. I don't see this as a hack. It's true that Item_sum_sum shouldn't care about "count", but is best-placed to maintain it cheaply and still cleanly. > Besides : what's wrong with checking for NULL ? it's a normal action done everywhere > else in the server. Checking for NULL is not wrong per se. I was saying that having: Item_sum_sum::add() { if (aggr->arg_is_null()) increase sum; } Item_sum_avg::add() { Item_sum_sum::add(); if (aggr->arg_is_null() increase count; } wasn't that efficient compared to Item_sum_sum::add() { if (aggr->arg_is_null()) increase sum and count; } and having no Item_sum_avg::add(). If you look at the first patch which I proposed, by moving "count" I eliminate Item_sum_avg::add() and Item_sum_avg::clear(), which I take as a sign of "improved design". That's why I proposed to retain your idea of using virtual functions to Item_sum_sum::add(), and at the same time put "count" in Item_sum_sum. If you believe that having "count" in Item_sum_sum is not possible, and neither of the two other patches I proposed is suitable, I propose to ask for the opinion of a second reviewer (I'll find one in my team soon).

Thread | ||
---|---|---|

• Re: #57932 [Com]: query with AVG(DISTINCT) returns NULL if last aggregatedvalue was NULL | Guilhem Bichot | 1 Dec |

• Re: #57932 [Com]: query with AVG(DISTINCT) returns NULL if last aggregatedvalue was NULL | Guilhem Bichot | 1 Dec |