List:Commits« Previous MessageNext Message »
From:Øystein Grøvlen Date:March 17 2010 2:27pm
Subject:Re: bzr commit into mysql-pe branch (Alexey.Kopytov:3678) Bug#8433
View as plain text  
Alexey Kopytov wrote:
 > Hi Øystein,
 >
 > On 10.03.10 15:05, Øystein Grøvlen wrote:
 >>>> - I wonder whether it in some cases would be an idea to optimize for
 >>>> small numbers since I would guess arithmetic operations on small
 >>>> numbers are more common than numbers that may generate overflow.
 >>>
 >>> Could you explain the reasons behind this guess? I think that for very
 >>> basic operations like these it is hard to define a 'typical' usage
 >>> pattern. It really depends on user data, and we know nothing about its
 >>> distribution.
 >>
 >> I think it pretty evident that the are a lot more domains with
 >> cardinalities less than 2^32 than above. Think for examples of all
 >> databases that records quantities of items (e.g., goods in stock). Very
 >> few of them will ever need such big numbers. Except for in scientific
 >> databases, I have difficulties thinking of applications which need such
 >> big integers, and when such big numbers are needed they will be
 >> represented as decimal (e.g., money) or floats.
 >>
 >
 > Well, think for example of all databases that record number of bytes.
 > Telecom applications, file/photo/video sharing sites. I think it's
 > pretty easy to come up with some aggregate queries in those domains that
 > deal with big numbers. So, no, not that evident.

Still for such databases, I will bet that most arithmetic would be for 
sizes less than 2^32.  Anyhow, it is probably not a very big deal, so do 
as you please.

 >
 >>>
 >>>> For example, in Item_func_mul::int_op() it seems that the path
 >>>> could be made shorter if it is detected earlier that both numbers
 >>>> are positive and less than 2^32.
 >>>>
 >>
 >>>
 >>> Yes, but that would surely make the code path for higher numbers
 >>> longer. It does not make any sense to me in light of the above.
 >>>
 >>>> - There are a bit duplication of code in this patch. I urge you to
 >>>> consider whether some of this duplication could be eliminated.
 >>>> For example, Item_func_div::int_op() and Item_func_mod::int_op()
 >>>> have very much in common.
 >>>>
 >>>
 >>> I am note sure what you meant here. Item_func_div()::int_op() has
 >>> absolutely nothing in common with Item_func_mod::int_op().
 >>>
 >>> But even assuming you were actually talking about
 >>> Item_func_int_div::val_int(), I only see a few lines duplicated in
 >>> Item_func_mod::int_op() where we basically just negate the arguments
 >>> keeping their original sign.
 >>
 >> Yes, sorry, I mean Item_func_int_div.
 >>
 >> AFAICT, everything in Item_func_mod::int_op() except the last two
 >> statements, is identical to code in Item_func_int_div::int_op().
 >>
 >
 > Here's a diff for your convenience: http://pastie.org/871065
 >
 > I am pretty sure any attempt to split common parts of those functions
 > into a separate function will actually create more code than it will
 > eliminate.

IMO, the problem is not the number of lines, but duplication of logic. 
Anyhow, with a function like this (the name is probably not the best):

bool Item_func::int_div_check(ulonglong *uval0, ulonglong* uval1,
                               bool *val0_negative, bool *val1_negative)
{
   longlong val0= args[0]->val_int();
   longlong val1= args[1]->val_int();

   if ((null_value= args[0]->null_value || args[1]->null_value))
     return true; /* purecov: inspected */
   if (val1 == 0)
   {
     signal_divide_by_null();
     return true;
   }

   *val0_negative= !args[0]->unsigned_flag && val0 < 0;
   *val1_negative= !args[1]->unsigned_flag && val1 < 0;
   *uval0= (ulonglong) (*val0_negative ? -val0 : val0);
   *uval1= (ulonglong) (*val1_negative ? -val1 : val1);
   return false;
}

Item_func_mod::int_op() may look like this:

longlong Item_func_mod::int_op()
{
   DBUG_ASSERT(fixed == 1);

   ulonglong uval0, uval1, res;
   bool val0_negative, val1_negative;
   if (int_div_check(&uval0, &uval1, &val0_negative, &val1_negative))
     return 0;
   res= uval0 % uval1;
   return check_integer_overflow(val0_negative ? -(longlong) res : res,
                                 !val0_negative);
}

and the same function can be reused in Item_func_int_div::val_int():

longlong Item_func_int_div::val_int()
{
   DBUG_ASSERT(fixed == 1);

   /*
     Perform division using DECIMAL math if either of the operands has a
     non-integer type
   */
   if (args[0]->result_type() != INT_RESULT ||
       args[1]->result_type() != INT_RESULT)
   {
     ...
   }

   bool val0_negative, val1_negative, res_negative;
   ulonglong uval0, uval1, res;

   if (int_div_check(&uval0, &uval1, &val0_negative, &val1_negative))
     return 0;

   res_negative= val0_negative != val1_negative;
   res= uval0 / uval1;
   if (res_negative)
   {
     if (res > (ulonglong) LONGLONG_MAX)
     {
       my_error(ER_DATA_OUT_OF_RANGE, MYF(0), "BIGINT");
       return 0;
     }
     res= (ulonglong) (-(longlong) res);
   }
   return check_integer_overflow(res, !res_negative);
}


Even more code duplication could be removed by using a function like this:

bool convert_to_ulonglong(Item* arg, ulonglong *uval, bool *neg)
{
   longlong val= arg->val_int();

   if (arg->null_value)
     return true; /* purecov: inspected */

   *neg= !arg->unsigned_flag && val < 0;
   *uval= (ulonglong) (neg ? -val : val);
   return false;
}


Then int_div_check() becomes:

bool Item_func::int_div_check(ulonglong *uval0, ulonglong* uval1,
                               bool *val0_negative, bool *val1_negative)
{
   if (null_value= (convert_to_ulonglong(args[0], uval0, val0_negative)||
                    convert_to_ulonglong(args[1], uval1, val1_negative))
     return true; /* purecov: inspected */
   if (uval1 == 0)
   {
     signal_divide_by_null();
     return true;
   }
}

I am not requiring that you do these changes, but please consider it.

 >
 >>
 >>>
 >>>> - Have you run coverage analysis to make sure all branches of this
 >>>> patch are covered by tests? I think that is necessary for such
 >>>> basic functionality.
 >>>>
 >>>
 >>> No, but I made sure it 'manually' when creating the test cases.
 >>> Basically, every single branch in the code corresponds to some test
 >>> case in the patch.
 >>
 >> I am not 100% convinced by this argument. It is human to error ...
 >>
 >
 > Ok, I have run the gcov tests. There were a few lines not covered by the
 > tests. None of those branches could lead to overflows. So I had to add a
 > few non-overflowing cases to the tests just to make gcov happy. I will
 > recommit the patch with those tests included before pushing.

IIUC, you are saying that those tests are not part of the latest
patch, but will be added before you push?  I am OK with that.

-- 
Øystein
Thread
bzr commit into mysql-pe branch (Alexey.Kopytov:3678) Bug#8433Alexey Kopytov17 Dec
  • Re: bzr commit into mysql-pe branch (Alexey.Kopytov:3678) Bug#8433Øystein Grøvlen23 Dec
    • Re: bzr commit into mysql-pe branch (Alexey.Kopytov:3678) Bug#8433Alexey Kopytov4 Mar
      • Re: bzr commit into mysql-pe branch (Alexey.Kopytov:3678) Bug#8433Øystein Grøvlen10 Mar
        • Re: bzr commit into mysql-pe branch (Alexey.Kopytov:3678) Bug#8433Alexey Kopytov15 Mar
          • Re: bzr commit into mysql-pe branch (Alexey.Kopytov:3678) Bug#8433Øystein Grøvlen17 Mar
            • Re: bzr commit into mysql-pe branch (Alexey.Kopytov:3678) Bug#8433Alexey Kopytov17 Mar