List:Commits« Previous MessageNext Message »
From:Sven Sandberg Date:January 14 2011 6:50pm
Subject:Re: bzr commit into mysql-5.1-bugteam branch (zhenxing.he:3418) Bug#42415
View as plain text  
Hi Zhenxing,

This is a partial review, let's discuss it more on IRC.

This is a very difficult bug and you did a very detailed and analysis. 
You also implemented the complicated algorithm that solves it.

There is still the the unresolved issue of how we can call fix_fields() 
before decide_logging_format(). I also have three suggestions to improve 
the implementation:

  (1) It seems you added a condition that marks the table as ordered 
when there is an ORDER BY on every column. I may be wrong but I think 
this is not correct.

  (2) I think you have implemented the algorithm I suggested in the bug 
report on 29 Oct. Please see my comment on 3 Nov: there is a more 
efficient algorithm.

  (3) This implementation does a lot of allocation. I think it's 
possible to optimize away that by using a more compact graph representation.

On 12/09/2010 05:18 AM, He Zhenxing wrote:
> === modified file 'sql/'
> --- a/sql/	2010-05-27 15:13:53 +0000
> +++ b/sql/	2010-12-09 04:18:42 +0000
> @@ -17162,6 +17178,457 @@ bool JOIN::change_result(select_result *
> +/*
> +  And-or graph truth propogation algorithm is used to calculate if the
> +  statement is ordered or not.
> +
> +  Nodes:
> +    JoinNode - And node, this is the root, successors are a list of
> +              ConstOrderedTableNode.
> +    ConstOrderedTableNode - Or node, have two TableNode as successors,
> +              one for ordered table, and the other for constant table.
> +    TableNode - Or node, have a list of KeyNode and one AllColumnsNode
> +              as successors.
> +    KeyNode - And node, successors are a list of ColumnNode that corresponding
> +              to the fields of the key.
> +    AllColumnsNode - And node, successors are a list of ColumnNode of all
> +              fields of the table.
> +    ColumnNode - Or node, represent a field of the table, it will take a
> +              TableNode as a successor, which means if a table is true
> +              (const or ordered), the all its columns are const or ordered.
> +              Successors to other column nodes will be added mutually if
> +              there is an equation to that field in the condition. The
> +              column nodes for fields that are in the ORDER BY or are
> +              equal to constants are added to the init_nodes of the JoinNode,
> +              which is used as the initial true values of the propogation.

(1) If I understand correctly, you have added the following condition 
(the AllColumnsNode):

"If there are ORDER BY clauses that cover all columns, then the table is 
marked as ordered".

However, I think this is not correct. ORDER BY clauses only make the 
table deterministic if there is a key. For example, a column may be 
ordered using a case-insensitive collation. An example:

   CREATE TABLE t1 (a TEXT COLLATE utf8_general_ci);
   INSERT INTO t1 VALUES ("a"), ("A");

+bool JoinNode::is_ordered()
+  List<Node> nodes(init_nodes);
+  List_iterator<Node> it(nodes);
+  Node *node;
+  while ((node= it++))
+  {
+    node->todo--;
+    if (node->todo == 0)
+    {
+      if (node == this)
+        return true;
+      List_iterator<Node> pit(node->predecessors);
+      Node *pnode;
+      while ((pnode= pit++))
+      {
+        if (pnode->todo)
+          nodes.push_back(pnode);
+      }
+    }
+  }
+  return false;

(2) I think the above implements the algorithm from my comment on the 
bug on 29 Oct. Please see the comment from 3 Nov: I think that algorithm 
has better space requirements.

(3) This implementation has to allocate two objects for every node in 
the graph (the Node object and a List<Node> object). I also think that 
the List class probably allocates one object per element; in that case 
there is also one allocated object per edge in the graph and one for 
each iteration of the algorithm. I think this may add both time and 
space overhead that is significant. I think we can reduce this to only 4 
allocations altogether, and at most 12 bytes per node + 4 bytes per 
edge. This should reduce both time and space requirements. Here's the idea:

  - Each node is identified by an integer. The root node is number 0.

  - The only data we need for each node is one integer: the "todo"

  - The graph structure can be represented using two integer lists. The
    first list contains all successor lists concatenated into one big
    flat list. The second list is indexed by node numbers and gives for
    each node the offset in the first list where the node's successors
    are stored.

  - The set S is represented as a list. Elements are added and removed
    from S in a stack-like way, i.e., pushed to the end when added and
    popped from the end when removed.

So the following data types can be used:

     The number of nodes in the graph.
   int n_nodes;
     The 'todo' array is indexed by node numbers. If N is a node, then
     todo[N] contains the number of successors of N that need to be
     marked deterministic before N is marked true.
   int *todo;
     Lists the successors of all nodes. This is a big flat list:
     first all successors of node 0, then all successors of node 1,
   int *successor;
     If N is a node, then node_successor[N] contains the index into
     the 'successor' array where the first successor of N is stored.
   int *node_successor;
     The 'dirty_nodes' array is a stack used during the evaluation
     of the and-or graph. It contains all nodes that have been marked
     as true where the truth value has not yet been propagated to
     the successors. 'n_dirty_nodes' is the number of elements
     in 'dirty_nodes'.
   int n_dirty_nodes;
   int *dirty_nodes;

The algorithm can then be implemented like:

   while (n_dirty_nodes)
     int node= dirty_nodes[--n_dirty_nodes];
     int i_end= node_successor[node + 1];
     for (int i= node_successor[node]; i < i_end; i++) {
       int succ= successor[i];
       if (todo[succ]) {
         if (todo[succ] == 0) {
           if (succ == 0) // the root node
             return TRUE;
           dirty_nodes[n_dirty_nodes++]= succ;
   return FALSE;

I think these data structures can be initialized as follows (this is 
just a sketch):

  1. Count the number of nodes of each type.
  2. Allocate todo (n_nodes elements), node_successor (n_nodes+1
     elements), and dirty_nodes (n_nodes elements).
  3. Count the number of successors for each node. Store this temporary
     number in node_successor.
  4. Allocate successor.
  5. Initialize node_successor:
      int ofs= 0;
      for (int i= 0; i <= n_nodes; i++) {
        int n_succ= node_successor[i];
        node_successor[i]= ofs;
        ofs += n_succ;
  6. Fill successor with elements. During this algorithm, we need to
     store the number of successors of each node that have been added
     to the array. This number can be stored in dirty_nodes.
  7. Initialize dirty_nodes.

bzr commit into mysql-5.1-bugteam branch (zhenxing.he:3418) Bug#42415He Zhenxing9 Dec
  • Re: bzr commit into mysql-5.1-bugteam branch (zhenxing.he:3418) Bug#42415Sven Sandberg14 Jan
    • Re: bzr commit into mysql-5.1-bugteam branch (zhenxing.he:3418)Bug#42415He Zhenxing18 Jan