List:Commits« Previous MessageNext Message »
From:marc.alff Date:August 8 2007 11:54pm
Subject:bk commit into 5.0 tree (malff:1.2491) BUG#30237
View as plain text  
Below is the list of changes that have just been committed into a local
5.0 repository of marcsql. When marcsql does a push these changes will
be propagated to the main repository and, within 24 hours after the
push, to the public repository.
For information on how to access the public repository
see http://dev.mysql.com/doc/mysql/en/installing-source-tree.html

ChangeSet@stripped, 2007-08-08 15:54:52-06:00, malff@weblab.(none) +2 -0
  Bug#30237 (Performance regression in boolean expressions)
  
  This is a performance bug, related to the parsing or 'OR' and 'AND' boolean
  expressions.
  
  Let N be the number of expressions involved in a OR (respectively AND).
  
  When N=1
  
  For example, "select 1" involve only 1 term: there is no OR operator.
  
  In 4.0 and 4.1, parsing expressions not involving OR had no overhead.
  In 5.0, parsing adds some overhead, with Select->expr_list.
  
  With this patch, the overhead introduced in 5.0 has been removed,
  so that performances for N=1 should be identical to the 4.0 performances,
  which are optimal (there is no code executed at all)
  
  The overhead in 5.0 was in fact affecting significantly some operations.
  For example, loading 1 Million rows into a table with INSERTs,
  for a table that has 100 columns, leads to parsing 100 Millions of
  expressions, which means that the overhead related to Select->expr_list
  is executed 100 Million times ...
  
  Considering that N=1 is by far the most probable expression,
  this case should be optimal.
  
  When N=2
  
  For example, "select a OR b" involves 2 terms in the OR operator.
  
  In 4.0 and 4.1, parsing expressions involving 2 terms created 1 Item_cond_or
  node, which is the expected result.
  In 5.0, parsing these expression also produced 1 node, but with some extra
  overhead related to Select->expr_list : creating 1 list in Select->expr_list
  and another in Item_cond::list is inefficient.
  
  With this patch, the overhead introduced in 5.0 has been removed
  so that performances for N=2 should be identical to the 4.0 performances.
  Note that the memory allocation uses the new (thd->mem_root) syntax
  directly.
  The cost of "is_cond_or" is estimated to be neglectable: the real problem
  of the performance degradation comes from unneeded memory allocations.
  
  When N>=3
  
  For example, "select a OR b OR c ...", which involves 3 or more terms.
  
  In 4.0 and 4.1, the parser had no significant cost overhead, but produced
  an Item tree which is difficult to evaluate / optimize during runtime.
  In 5.0, the parser produces a better Item tree, using the Item_cond
  constructor that accepts a list of children directly, but at an extra cost
  related to Select->expr_list.
  
  With this patch, the code is implemented to take the best of the two
  implementations:
  - there is no overhead with Select->expr_list
  - the Item tree generated is optimized and flattened.
  
  This is achieved by adding children nodes into the Item tree directly,
  with Item_cond::add(), which avoids the need for temporary lists and memory
  allocation
  
  Note that this patch also provide an extra optimization, that the previous
  code in 5.0 did not provide: expressions are flattened in the Item tree,
  based on what the expression already parsed is, and not based on the order
  in which rules are reduced.
  
  For example : "(a OR b) OR c", "a OR (b OR c)" would both be represented
  with 2 Item_cond_or nodes before this patch, and with 1 node only with this
  patch. The logic used is based on the mathematical properties of the OR
  operator (it's associative), and produces a simpler tree.
  
  Performance tests:
  
  Tests have been executed for N=1, that show an improvement.
  
  When executing repeatedly INSERT <100 values> into <blackhole table>,
  the timing is as follows (time mysqlslap ...)
  
  See the bug report for 30327 for the scripts used, these are not part of
  the patch.
  
  100,000 inserts of 100 values (10 Million expressions)
  Before
  real    0m57.767s
  user    0m3.088s
  sys     0m3.188s
  
  After
  real    0m50.722s
  user    0m3.332s
  sys     0m3.352s
  
  
  1,000,000 inserts of 100 values (100 Million expressions)
  Before
  real    9m14.565s
  user    0m33.390s
  sys     0m35.194s
  
  After
  real    8m30.286s
  user    0m33.410s
  sys     0m35.438s
  
  Performance measures for N=2 or N>=3 are not available.
  The code is expected to be faster (per code analysis).
  
  Review note:
  The grammar contains recursive rules around <expr>, which causes shift/reduce
  conflicts. The rules added for this patch are conflict free (verified
  independently), but existing ambiguities cause new S/R conflicts to be
  inferred, as is currently the case for all rules below <expr>
  (see sql_yacc.output).
  This patch is safe for the grammar.
  The grammar should be cleaned up independently of this change.

  sql/item_cmpfunc.h@stripped, 2007-08-08 15:54:45-06:00, malff@weblab.(none) +18 -0
    Improved performances for parsing boolean expressions

  sql/sql_yacc.yy@stripped, 2007-08-08 15:54:45-06:00, malff@weblab.(none) +94 -39
    Improved performances for parsing boolean expressions

diff -Nrup a/sql/item_cmpfunc.h b/sql/item_cmpfunc.h
--- a/sql/item_cmpfunc.h	2007-07-15 11:49:58 -06:00
+++ b/sql/item_cmpfunc.h	2007-08-08 15:54:45 -06:00
@@ -1361,6 +1361,7 @@ public:
   Item_cond(List<Item> &nlist)
     :Item_bool_func(), list(nlist), abort_on_null(0) {}
   bool add(Item *item) { return list.push_back(item); }
+  bool add_at_head(Item *item) { return list.push_front(item); }
   void add_at_head(List<Item> *nlist) { list.prepand(nlist); }
   bool fix_fields(THD *, Item **ref);
 
@@ -1554,6 +1555,15 @@ public:
   Item *neg_transformer(THD *thd);
 };
 
+inline bool is_cond_and(Item *item)
+{
+  if (item->type() != Item::COND_ITEM)
+    return FALSE;
+
+  Item_cond *cond_item= (Item_cond*) item;
+  return (cond_item->functype() == Item_func::COND_AND_FUNC);
+}
+
 class Item_cond_or :public Item_cond
 {
 public:
@@ -1575,6 +1585,14 @@ public:
   Item *neg_transformer(THD *thd);
 };
 
+inline bool is_cond_or(Item *item)
+{
+  if (item->type() != Item::COND_ITEM)
+    return FALSE;
+
+  Item_cond *cond_item= (Item_cond*) item;
+  return (cond_item->functype() == Item_func::COND_OR_FUNC);
+}
 
 /*
   XOR is Item_cond, not an Item_int_func because we could like to
diff -Nrup a/sql/sql_yacc.yy b/sql/sql_yacc.yy
--- a/sql/sql_yacc.yy	2007-07-23 16:23:33 -06:00
+++ b/sql/sql_yacc.yy	2007-08-08 15:54:45 -06:00
@@ -461,7 +461,7 @@ bool my_yyoverflow(short **a, YYSTYPE **
   Currently there is 251 shift/reduce conflict. We should not introduce
   new conflicts any more.
 */
-%expect 251
+%expect 255
 
 %token  END_OF_INPUT
 
@@ -1066,7 +1066,8 @@ bool my_yyoverflow(short **a, YYSTYPE **
 %type <item>
 	literal text_literal insert_ident order_ident
 	simple_ident select_item2 expr opt_expr opt_else sum_expr in_sum_expr
-	variable variable_aux bool_term bool_factor bool_test bool_pri 
+	variable variable_aux bool_xterm bool_term bool_factor
+        bool_test bool_pri
 	predicate bit_expr bit_term bit_factor value_expr term factor
 	table_wild simple_expr udf_expr
 	expr_or_default set_expr_or_default interval_expr
@@ -4464,53 +4465,107 @@ optional_braces:
 	| '(' ')' {};
 
 /* all possible expressions */
-expr:	
-	  bool_term { Select->expr_list.push_front(new List<Item>); }
-          bool_or_expr
-          {
-            List<Item> *list= Select->expr_list.pop();
-            if (list->elements)
-            {
-              list->push_front($1);
-              $$= new Item_cond_or(*list);
-              /* optimize construction of logical OR to reduce
-                 amount of objects for complex expressions */
+expr:
+          bool_xterm
+        | expr or bool_xterm
+          {
+            Item_cond_or *item1;
+            Item_cond_or *item3;
+            if (is_cond_or($1))
+            {
+              item1= (Item_cond_or*) $1;
+              if (is_cond_or($3))
+              {
+                item3= (Item_cond_or*) $3;
+                /*
+                  (X1 OR X2) OR (Y1 OR Y2) ==> OR (X1, X2, Y1, Y2)
+                */
+                item3->add_at_head(item1->argument_list());
+                $$ = $3;
+              }
+              else
+              {
+                /*
+                  (X1 OR X2) OR Y ==> OR (X1, X2, Y)
+                */
+                item1->add($3);
+                $$ = $1;
+              }
             }
             else
-              $$= $1;
-            delete list;
+            {
+              if (is_cond_or($3))
+              {
+                item3= (Item_cond_or*) $3;
+                /*
+                  X OR (Y1 OR Y2) ==> OR (X, Y1, Y2)
+                */
+                item3->add_at_head($1);
+                $$ = $3;
+              }
+              else
+              {
+                /* X OR Y */
+                $$ = new (YYTHD->mem_root) Item_cond_or($1, $3);
+              }
+            }
           }
-	;
+        ;
 
-bool_or_expr:
-	/* empty */
-        | bool_or_expr or bool_term
-          { Select->expr_list.head()->push_back($3); }
+bool_xterm:
+          bool_term
+        | bool_xterm XOR bool_term
+          {
+            /* XOR is a proprietary extension */
+            $$ = new (YYTHD->mem_root) Item_cond_xor($1, $3);
+          }
         ;
 
 bool_term:
-	bool_term XOR bool_term { $$= new Item_cond_xor($1,$3); }
-	| bool_factor { Select->expr_list.push_front(new List<Item>); }
-          bool_and_expr
-          {
-            List<Item> *list= Select->expr_list.pop();
-            if (list->elements)
-            {
-              list->push_front($1);
-              $$= new Item_cond_and(*list);
-              /* optimize construction of logical AND to reduce
-                 amount of objects for complex expressions */
+          bool_factor
+        | bool_term and bool_factor
+          {
+            Item_cond_and *item1;
+            Item_cond_and *item3;
+            if (is_cond_and($1))
+            {
+              item1= (Item_cond_and*) $1;
+              if (is_cond_and($3))
+              {
+                item3= (Item_cond_and*) $3;
+                /*
+                  (X1 AND X2) AND (Y1 AND Y2) ==> AND (X1, X2, Y1, Y2)
+                */
+                item3->add_at_head(item1->argument_list());
+                $$ = $3;
+              }
+              else
+              {
+                /*
+                  (X1 AND X2) AND Y ==> AND (X1, X2, Y)
+                */
+                item1->add($3);
+                $$ = $1;
+              }
             }
             else
-              $$= $1;
-            delete list;
+            {
+              if (is_cond_and($3))
+              {
+                item3= (Item_cond_and*) $3;
+                /*
+                  X AND (Y1 AND Y2) ==> AND (X, Y1, Y2)
+                */
+                item3->add_at_head($1);
+                $$ = $3;
+              }
+              else
+              {
+                /* X AND Y */
+                $$ = new (YYTHD->mem_root) Item_cond_and($1, $3);
+              }
+            }
           }
-	;
-
-bool_and_expr:
-	/* empty */
-        | bool_and_expr and bool_factor
-          { Select->expr_list.head()->push_back($3); }
         ;
 
 bool_factor:
Thread
bk commit into 5.0 tree (malff:1.2491) BUG#30237marc.alff8 Aug