Alexey Kopytov wrote:
> Hi Øystein,
>
> On 10.03.10 15:05, Øystein Grøvlen wrote:
>>>> - I wonder whether it in some cases would be an idea to optimize for
>>>> small numbers since I would guess arithmetic operations on small
>>>> numbers are more common than numbers that may generate overflow.
>>>
>>> Could you explain the reasons behind this guess? I think that for very
>>> basic operations like these it is hard to define a 'typical' usage
>>> pattern. It really depends on user data, and we know nothing about its
>>> distribution.
>>
>> I think it pretty evident that the are a lot more domains with
>> cardinalities less than 2^32 than above. Think for examples of all
>> databases that records quantities of items (e.g., goods in stock). Very
>> few of them will ever need such big numbers. Except for in scientific
>> databases, I have difficulties thinking of applications which need such
>> big integers, and when such big numbers are needed they will be
>> represented as decimal (e.g., money) or floats.
>>
>
> Well, think for example of all databases that record number of bytes.
> Telecom applications, file/photo/video sharing sites. I think it's
> pretty easy to come up with some aggregate queries in those domains that
> deal with big numbers. So, no, not that evident.
Still for such databases, I will bet that most arithmetic would be for
sizes less than 2^32. Anyhow, it is probably not a very big deal, so do
as you please.
>
>>>
>>>> For example, in Item_func_mul::int_op() it seems that the path
>>>> could be made shorter if it is detected earlier that both numbers
>>>> are positive and less than 2^32.
>>>>
>>
>>>
>>> Yes, but that would surely make the code path for higher numbers
>>> longer. It does not make any sense to me in light of the above.
>>>
>>>> - There are a bit duplication of code in this patch. I urge you to
>>>> consider whether some of this duplication could be eliminated.
>>>> For example, Item_func_div::int_op() and Item_func_mod::int_op()
>>>> have very much in common.
>>>>
>>>
>>> I am note sure what you meant here. Item_func_div()::int_op() has
>>> absolutely nothing in common with Item_func_mod::int_op().
>>>
>>> But even assuming you were actually talking about
>>> Item_func_int_div::val_int(), I only see a few lines duplicated in
>>> Item_func_mod::int_op() where we basically just negate the arguments
>>> keeping their original sign.
>>
>> Yes, sorry, I mean Item_func_int_div.
>>
>> AFAICT, everything in Item_func_mod::int_op() except the last two
>> statements, is identical to code in Item_func_int_div::int_op().
>>
>
> Here's a diff for your convenience: http://pastie.org/871065
>
> I am pretty sure any attempt to split common parts of those functions
> into a separate function will actually create more code than it will
> eliminate.
IMO, the problem is not the number of lines, but duplication of logic.
Anyhow, with a function like this (the name is probably not the best):
bool Item_func::int_div_check(ulonglong *uval0, ulonglong* uval1,
bool *val0_negative, bool *val1_negative)
{
longlong val0= args[0]->val_int();
longlong val1= args[1]->val_int();
if ((null_value= args[0]->null_value || args[1]->null_value))
return true; /* purecov: inspected */
if (val1 == 0)
{
signal_divide_by_null();
return true;
}
*val0_negative= !args[0]->unsigned_flag && val0 < 0;
*val1_negative= !args[1]->unsigned_flag && val1 < 0;
*uval0= (ulonglong) (*val0_negative ? -val0 : val0);
*uval1= (ulonglong) (*val1_negative ? -val1 : val1);
return false;
}
Item_func_mod::int_op() may look like this:
longlong Item_func_mod::int_op()
{
DBUG_ASSERT(fixed == 1);
ulonglong uval0, uval1, res;
bool val0_negative, val1_negative;
if (int_div_check(&uval0, &uval1, &val0_negative, &val1_negative))
return 0;
res= uval0 % uval1;
return check_integer_overflow(val0_negative ? -(longlong) res : res,
!val0_negative);
}
and the same function can be reused in Item_func_int_div::val_int():
longlong Item_func_int_div::val_int()
{
DBUG_ASSERT(fixed == 1);
/*
Perform division using DECIMAL math if either of the operands has a
non-integer type
*/
if (args[0]->result_type() != INT_RESULT ||
args[1]->result_type() != INT_RESULT)
{
...
}
bool val0_negative, val1_negative, res_negative;
ulonglong uval0, uval1, res;
if (int_div_check(&uval0, &uval1, &val0_negative, &val1_negative))
return 0;
res_negative= val0_negative != val1_negative;
res= uval0 / uval1;
if (res_negative)
{
if (res > (ulonglong) LONGLONG_MAX)
{
my_error(ER_DATA_OUT_OF_RANGE, MYF(0), "BIGINT");
return 0;
}
res= (ulonglong) (-(longlong) res);
}
return check_integer_overflow(res, !res_negative);
}
Even more code duplication could be removed by using a function like this:
bool convert_to_ulonglong(Item* arg, ulonglong *uval, bool *neg)
{
longlong val= arg->val_int();
if (arg->null_value)
return true; /* purecov: inspected */
*neg= !arg->unsigned_flag && val < 0;
*uval= (ulonglong) (neg ? -val : val);
return false;
}
Then int_div_check() becomes:
bool Item_func::int_div_check(ulonglong *uval0, ulonglong* uval1,
bool *val0_negative, bool *val1_negative)
{
if (null_value= (convert_to_ulonglong(args[0], uval0, val0_negative)||
convert_to_ulonglong(args[1], uval1, val1_negative))
return true; /* purecov: inspected */
if (uval1 == 0)
{
signal_divide_by_null();
return true;
}
}
I am not requiring that you do these changes, but please consider it.
>
>>
>>>
>>>> - Have you run coverage analysis to make sure all branches of this
>>>> patch are covered by tests? I think that is necessary for such
>>>> basic functionality.
>>>>
>>>
>>> No, but I made sure it 'manually' when creating the test cases.
>>> Basically, every single branch in the code corresponds to some test
>>> case in the patch.
>>
>> I am not 100% convinced by this argument. It is human to error ...
>>
>
> Ok, I have run the gcov tests. There were a few lines not covered by the
> tests. None of those branches could lead to overflows. So I had to add a
> few non-overflowing cases to the tests just to make gcov happy. I will
> recommit the patch with those tests included before pushing.
IIUC, you are saying that those tests are not part of the latest
patch, but will be added before you push? I am OK with that.
--
Øystein