Hi Jørgen,
Thanks for taking a systematic approach for fixing this. The commit
message is great and helped a lot to understand how the patch solves the
problem.
The fix looks good and works as described in the commit message and the
comments in the source code. I have only some very minor comments that
you mostly are free to ignore.
OK to push.
Olav
On 06/05/2011 15:57, Jorgen Loland wrote:
> #At file:///export/home/jl208045/mysql/mysql-trunk-11765831/ based on
> revid:jorgen.loland@stripped
>
> 3382 Jorgen Loland 2011-05-06
> BUG#11765831: 'RANGE ACCESS' MAY INCORRECTLY FILTER
> AWAY QUALIFYING ROWS
>
> The problem was that the ranges created when OR'ing two
> conditions could be incorrect. Without the bugfix,
> "I<> 6 OR (I<> 8 AND J = 5)" would create these ranges:
>
> "NULL< I< 6",
> "6<= I<= 6 AND 5<= J<= 5",
> "6< I< 8",
> "8<= I<= 8 AND 5<= J<= 5",
> "8< I"
>
> While the correct ranges is
> "NULL< I< 6",
> "6<= I<= 6 AND 5<= J<= 5",
> "6< I"
>
> The problem occurs when key_or() ORs
> (1) "NULL< I< 6, 6<= I<= 6 AND 5<= J<= 5, 6< I" with
> (2) "8< I AND 5<= J<= 5"
>
> The reason for the bug is that in key_or(), SEL_ARG *tmp is
> used to point to the range in (1) above that is merged with
> (2) while key1 points to the root of the red-black tree of
> (1). When merging (1) and (2), tmp refers to the "6< I"
> part whereas the root is the "6<= ... AND 5<= J<= 5" part.
>
> key_or() decides that the tmp range needs to be split into
> "6< I< 8, 8<= I<= 8, 8< I", in which next_key_part of the
> second range should be that of tmp. However, next_key_part is
> set to key1->next_key_part ("5<= J<= 5") instead of
> tmp->next_key_part (empty). Fixing this gives the correct but
> not optimal ranges:
> "NULL< I< 6",
> "6<= I<= 6 AND 5<= J<= 5",
> "6< I< 8",
> "8<= I<= 8",
> "8< I"
>
> A second problem can be seen above: key_or() may create
> adjacent ranges that could be replaced with a single range.
> Fixes for this is also included in the patch so that the range
> above becomes correct AND optimal:
> "NULL< I< 6",
> "6<= I<= 6 AND 5<= J<= 5",
> "6< I"
>
> Merging adjacent ranges like this gives a slightly lower cost
> estimate for the range access.
> @ mysql-test/include/range.inc
> Add test for BUG#11765831
> @ mysql-test/r/group_min_max.result
> BUG#11765831 merges adjacent ranges, resulting in a slightly lower cost
> estimate for accessing the table through range access.
> @ mysql-test/r/range_icp.result
> Add test for BUG#11765831
> @ mysql-test/r/range_icp_mrr.result
> Add test for BUG#11765831
> @ mysql-test/r/range_mrr.result
> Add test for BUG#11765831
> @ mysql-test/r/range_mrr_cost.result
> Add test for BUG#11765831
> @ mysql-test/r/range_none.result
> Add test for BUG#11765831
> @ sql/opt_range.cc
> In key_or(): When a range in key1 was split due to a partially overlapping
> range in key2, the non-overlaping part incorrectly got next_key_part from the R-B tree
> root node instead of the currently active range. key_or() could also create adjacent
> ranges that could be replaced by a single continous range. This is also fixed.
>
> modified:
> mysql-test/include/range.inc
> mysql-test/r/group_min_max.result
> mysql-test/r/range_icp.result
> mysql-test/r/range_icp_mrr.result
> mysql-test/r/range_mrr.result
> mysql-test/r/range_mrr_cost.result
> mysql-test/r/range_none.result
> sql/opt_range.cc
> === modified file 'mysql-test/include/range.inc'
> --- a/mysql-test/include/range.inc 2011-02-01 12:47:39 +0000
> +++ b/mysql-test/include/range.inc 2011-05-06 13:57:35 +0000
> @@ -1471,3 +1471,35 @@ INSERT INTO t2 VALUES (1, 1, 2);
> EXPLAIN SELECT * FROM t2 WHERE a = 1 AND b>= 2 AND c>= 2;
>
> DROP TABLE t1, t2;
> +
> +--echo #
> +--echo # BUG#11765831: 'RANGE ACCESS' MAY INCORRECTLY FILTER
> +--echo # AWAY QUALIFYING ROWS
> +--echo #
> +
> +CREATE TABLE t10(
> + K INT NOT NULL AUTO_INCREMENT,
> + I INT, J INT,
> + PRIMARY KEY(K),
> + KEY(I,J)
> +);
> +INSERT INTO t10(I,J) VALUES (6,1),(6,2),(6,3),(6,4),(6,5),
> + (6,6),(6,7),(6,8),(6,9),(6,0);
> +
> +CREATE TABLE t100 LIKE t10;
> +INSERT INTO t100(I,J) SELECT X.I, X.K+(10*Y.K) FROM t10 AS X,t10 AS Y;
> +
> +# Insert offending value:
> +INSERT INTO t100(I,J) VALUES(8,26);
> +
> +let $query= SELECT * FROM t100 WHERE I<> 6 OR (I<> 8 AND J = 5);
> +
> +#Verify that 'range' access will be used
> +--echo
> +--eval EXPLAIN $query
> +
> +# Only row 101,8,26 should be returned
> +--echo
> +--eval $query
> +
> +DROP TABLE t10,t100;
>
> === modified file 'mysql-test/r/group_min_max.result'
> --- a/mysql-test/r/group_min_max.result 2011-04-27 09:10:45 +0000
> +++ b/mysql-test/r/group_min_max.result 2011-05-06 13:57:35 +0000
> @@ -876,10 +876,10 @@ id select_type table type possible_keys
> 1 SIMPLE t1 range NULL idx_t1_1 163 NULL 17 Using where; Using index for group-by
> explain select a1,a2,b, max(c) from t1 where (c> 'b1') or (c<= 'g1')
> group by a1,a2,b;
> id select_type table type possible_keys key key_len ref rows Extra
> -1 SIMPLE t1 range NULL idx_t1_1 163 NULL 17 Using where; Using index for group-by
> +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 17 Using where; Using index for group-by
> explain select a1,a2,b,min(c),max(c) from t1 where (c> 'b1') or (c<= 'g1')
> group by a1,a2,b;
> id select_type table type possible_keys key key_len ref rows Extra
> -1 SIMPLE t1 range NULL idx_t1_1 163 NULL 17 Using where; Using index for group-by
> +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 17 Using where; Using index for group-by
> explain select a1,a2,b,min(c),max(c) from t1 where (c> 'b111') and (c<=
> 'g112') group by a1,a2,b;
> id select_type table type possible_keys key key_len ref rows Extra
> 1 SIMPLE t1 range NULL idx_t1_1 163 NULL 17 Using where; Using index for group-by
> @@ -924,7 +924,7 @@ id select_type table type possible_keys
> 1 SIMPLE t2 range NULL idx_t2_1 163 NULL # Using where; Using index for group-by
> explain select a1,a2,b, max(c) from t2 where (c> 'b1') or (c<= 'g1')
> group by a1,a2,b;
> id select_type table type possible_keys key key_len ref rows Extra
> -1 SIMPLE t2 range NULL idx_t2_1 163 NULL # Using where; Using index for group-by
> +1 SIMPLE t2 range NULL idx_t2_1 146 NULL # Using where; Using index for group-by
> explain select a1,a2,b,min(c),max(c) from t2 where (c> 'b1') or (c<= 'g1')
> group by a1,a2,b;
> id select_type table type possible_keys key key_len ref rows Extra
> 1 SIMPLE t2 range NULL idx_t2_1 163 NULL # Using where; Using index for group-by
>
> === modified file 'mysql-test/r/range_icp.result'
> --- a/mysql-test/r/range_icp.result 2011-02-01 12:47:39 +0000
> +++ b/mysql-test/r/range_icp.result 2011-05-06 13:57:35 +0000
> @@ -1849,4 +1849,28 @@ EXPLAIN SELECT * FROM t2 WHERE a = 1 AND
> id select_type table type possible_keys key key_len ref rows Extra
> 1 SIMPLE t2 ref k,l,m,n l 5 const 69 Using index condition; Using where
> DROP TABLE t1, t2;
> +#
> +# BUG#11765831: 'RANGE ACCESS' MAY INCORRECTLY FILTER
> +# AWAY QUALIFYING ROWS
> +#
> +CREATE TABLE t10(
> +K INT NOT NULL AUTO_INCREMENT,
> +I INT, J INT,
> +PRIMARY KEY(K),
> +KEY(I,J)
> +);
> +INSERT INTO t10(I,J) VALUES (6,1),(6,2),(6,3),(6,4),(6,5),
> +(6,6),(6,7),(6,8),(6,9),(6,0);
> +CREATE TABLE t100 LIKE t10;
> +INSERT INTO t100(I,J) SELECT X.I, X.K+(10*Y.K) FROM t10 AS X,t10 AS Y;
> +INSERT INTO t100(I,J) VALUES(8,26);
> +
> +EXPLAIN SELECT * FROM t100 WHERE I<> 6 OR (I<> 8 AND J = 5);
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t100 range I I 10 NULL 4 Using index condition
> +
> +SELECT * FROM t100 WHERE I<> 6 OR (I<> 8 AND J = 5);
> +K I J
> +101 8 26
> +DROP TABLE t10,t100;
> set optimizer_switch=default;
>
> === modified file 'mysql-test/r/range_icp_mrr.result'
> --- a/mysql-test/r/range_icp_mrr.result 2011-02-01 12:47:39 +0000
> +++ b/mysql-test/r/range_icp_mrr.result 2011-05-06 13:57:35 +0000
> @@ -1849,4 +1849,28 @@ EXPLAIN SELECT * FROM t2 WHERE a = 1 AND
> id select_type table type possible_keys key key_len ref rows Extra
> 1 SIMPLE t2 ref k,l,m,n l 5 const 69 Using index condition; Using where
> DROP TABLE t1, t2;
> +#
> +# BUG#11765831: 'RANGE ACCESS' MAY INCORRECTLY FILTER
> +# AWAY QUALIFYING ROWS
> +#
> +CREATE TABLE t10(
> +K INT NOT NULL AUTO_INCREMENT,
> +I INT, J INT,
> +PRIMARY KEY(K),
> +KEY(I,J)
> +);
> +INSERT INTO t10(I,J) VALUES (6,1),(6,2),(6,3),(6,4),(6,5),
> +(6,6),(6,7),(6,8),(6,9),(6,0);
> +CREATE TABLE t100 LIKE t10;
> +INSERT INTO t100(I,J) SELECT X.I, X.K+(10*Y.K) FROM t10 AS X,t10 AS Y;
> +INSERT INTO t100(I,J) VALUES(8,26);
> +
> +EXPLAIN SELECT * FROM t100 WHERE I<> 6 OR (I<> 8 AND J = 5);
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t100 range I I 10 NULL 4 Using index condition; Using MRR
> +
> +SELECT * FROM t100 WHERE I<> 6 OR (I<> 8 AND J = 5);
> +K I J
> +101 8 26
> +DROP TABLE t10,t100;
> set optimizer_switch=default;
>
> === modified file 'mysql-test/r/range_mrr.result'
> --- a/mysql-test/r/range_mrr.result 2011-02-01 12:47:39 +0000
> +++ b/mysql-test/r/range_mrr.result 2011-05-06 13:57:35 +0000
> @@ -1849,4 +1849,28 @@ EXPLAIN SELECT * FROM t2 WHERE a = 1 AND
> id select_type table type possible_keys key key_len ref rows Extra
> 1 SIMPLE t2 ref k,l,m,n l 5 const 69 Using where
> DROP TABLE t1, t2;
> +#
> +# BUG#11765831: 'RANGE ACCESS' MAY INCORRECTLY FILTER
> +# AWAY QUALIFYING ROWS
> +#
> +CREATE TABLE t10(
> +K INT NOT NULL AUTO_INCREMENT,
> +I INT, J INT,
> +PRIMARY KEY(K),
> +KEY(I,J)
> +);
> +INSERT INTO t10(I,J) VALUES (6,1),(6,2),(6,3),(6,4),(6,5),
> +(6,6),(6,7),(6,8),(6,9),(6,0);
> +CREATE TABLE t100 LIKE t10;
> +INSERT INTO t100(I,J) SELECT X.I, X.K+(10*Y.K) FROM t10 AS X,t10 AS Y;
> +INSERT INTO t100(I,J) VALUES(8,26);
> +
> +EXPLAIN SELECT * FROM t100 WHERE I<> 6 OR (I<> 8 AND J = 5);
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t100 range I I 10 NULL 4 Using where; Using MRR
> +
> +SELECT * FROM t100 WHERE I<> 6 OR (I<> 8 AND J = 5);
> +K I J
> +101 8 26
> +DROP TABLE t10,t100;
> set optimizer_switch=default;
>
> === modified file 'mysql-test/r/range_mrr_cost.result'
> --- a/mysql-test/r/range_mrr_cost.result 2011-02-01 12:47:39 +0000
> +++ b/mysql-test/r/range_mrr_cost.result 2011-05-06 13:57:35 +0000
> @@ -1849,4 +1849,28 @@ EXPLAIN SELECT * FROM t2 WHERE a = 1 AND
> id select_type table type possible_keys key key_len ref rows Extra
> 1 SIMPLE t2 ref k,l,m,n l 5 const 69 Using where
> DROP TABLE t1, t2;
> +#
> +# BUG#11765831: 'RANGE ACCESS' MAY INCORRECTLY FILTER
> +# AWAY QUALIFYING ROWS
> +#
> +CREATE TABLE t10(
> +K INT NOT NULL AUTO_INCREMENT,
> +I INT, J INT,
> +PRIMARY KEY(K),
> +KEY(I,J)
> +);
> +INSERT INTO t10(I,J) VALUES (6,1),(6,2),(6,3),(6,4),(6,5),
> +(6,6),(6,7),(6,8),(6,9),(6,0);
> +CREATE TABLE t100 LIKE t10;
> +INSERT INTO t100(I,J) SELECT X.I, X.K+(10*Y.K) FROM t10 AS X,t10 AS Y;
> +INSERT INTO t100(I,J) VALUES(8,26);
> +
> +EXPLAIN SELECT * FROM t100 WHERE I<> 6 OR (I<> 8 AND J = 5);
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t100 range I I 10 NULL 4 Using where
> +
> +SELECT * FROM t100 WHERE I<> 6 OR (I<> 8 AND J = 5);
> +K I J
> +101 8 26
> +DROP TABLE t10,t100;
> set optimizer_switch=default;
>
> === modified file 'mysql-test/r/range_none.result'
> --- a/mysql-test/r/range_none.result 2011-02-01 12:47:39 +0000
> +++ b/mysql-test/r/range_none.result 2011-05-06 13:57:35 +0000
> @@ -1848,4 +1848,28 @@ EXPLAIN SELECT * FROM t2 WHERE a = 1 AND
> id select_type table type possible_keys key key_len ref rows Extra
> 1 SIMPLE t2 ref k,l,m,n l 5 const 69 Using where
> DROP TABLE t1, t2;
> +#
> +# BUG#11765831: 'RANGE ACCESS' MAY INCORRECTLY FILTER
> +# AWAY QUALIFYING ROWS
> +#
> +CREATE TABLE t10(
> +K INT NOT NULL AUTO_INCREMENT,
> +I INT, J INT,
> +PRIMARY KEY(K),
> +KEY(I,J)
> +);
> +INSERT INTO t10(I,J) VALUES (6,1),(6,2),(6,3),(6,4),(6,5),
> +(6,6),(6,7),(6,8),(6,9),(6,0);
> +CREATE TABLE t100 LIKE t10;
> +INSERT INTO t100(I,J) SELECT X.I, X.K+(10*Y.K) FROM t10 AS X,t10 AS Y;
> +INSERT INTO t100(I,J) VALUES(8,26);
> +
> +EXPLAIN SELECT * FROM t100 WHERE I<> 6 OR (I<> 8 AND J = 5);
> +id select_type table type possible_keys key key_len ref rows Extra
> +1 SIMPLE t100 range I I 10 NULL 4 Using where
> +
> +SELECT * FROM t100 WHERE I<> 6 OR (I<> 8 AND J = 5);
> +K I J
> +101 8 26
> +DROP TABLE t10,t100;
> set optimizer_switch=default;
>
> === modified file 'sql/opt_range.cc'
> --- a/sql/opt_range.cc 2011-05-06 13:26:31 +0000
> +++ b/sql/opt_range.cc 2011-05-06 13:57:35 +0000
> @@ -7069,11 +7069,53 @@ key_or(RANGE_OPT_PARAM *param, SEL_ARG *
> This is the case ("cmp>=0" means that tmp.max>= key2.min):
> key2: [----]
> tmp: [------------*****]
> + */
> +
> + if (!tmp->next_key_part)
> + {
> + /*
> + tmp has no next_key_part, thus
> + tmp->next_key_part == (tmp->next_key_part OR
> key2->next_key_part)
> + == empty
> +
> + The range that is covered by tmp can be cut from key2
I found this a bit hard to read and understand. I think the more
textly/verbose way of explain a similar situation you used further down
(as step 0) was easier to understand. Suggestion: Change this to be more
like the explanation for "step 0" below.
> + */
Remove on "space" in front of */
> + if (tmp->cmp_max_to_max(key2)>= 0)
> + {
> + /*
> + tmp covers the entire range in key2.
> + key2: [----]
> + tmp: [-----------------]
> +
> + Move on to next range in key2
> + */
> + key2->increment_use_count(-1); // Free not used tree
> + key2=key2->next;
> + continue;
> + }
> + else
> + {
> + /*
> + This is the case:
> + key2: [-------]
> + tmp: [---------]
> +
> + Result:
> + key2: [---]
> + tmp: [---------]
> + */
> + key2->copy_max_to_min(tmp);
> + continue;
> + }
> + }
>
> + /*
> The ranges are overlapping but have not been merged because
> - next_key_part of tmp and key2 are different
> + next_key_part of tmp and key2 differ.
> + key2: [----]
> + tmp: [------------*****]
>
> - Result:
> + Split tmp in two where key2 starts:
> key2: [----]
> key1: [--------][--*****]
> ^ ^
> @@ -7082,7 +7124,7 @@ key_or(RANGE_OPT_PARAM *param, SEL_ARG *
> SEL_ARG *new_arg=tmp->clone_first(key2);
> if (!new_arg)
> return 0; // OOM
> - if ((new_arg->next_key_part= key1->next_key_part))
> + if ((new_arg->next_key_part= tmp->next_key_part))
> new_arg->increment_use_count(key1->use_count+1);
> tmp->copy_min_to_min(key2);
> key1=key1->insert(new_arg);
> @@ -7191,12 +7233,21 @@ key_or(RANGE_OPT_PARAM *param, SEL_ARG *
> ^ ^
> new_arg tmp
> Steps:
> + 0) If tmp->next_key_part is empty: do nothing. Reason:
> + (key2_cpy->next_key_part OR tmp->next_key_part) will be
> + empty and therefore equal to tmp->next_key_part. Thus,
> + the range in key2_cpy is completely covered by tmp
> 1) Make new_arg with range [tmp.min, key2_cpy.max].
> new_arg->next_key_part is OR between next_key_part
> of tmp and key2_cpy
> 2) Make tmp the range [key2.max, tmp.max]
> 3) Insert new_arg into key1
> */
> + if (!tmp->next_key_part) // Step 0)
Minor suggestion: just write "Step 0" (without the ")" ) as it makes it
harder to see that this is a comment?
> + {
> + key2_cpy.increment_use_count(-1); // Free not used tree
> + break;
> + }
> SEL_ARG *new_arg=tmp->clone_last(&key2_cpy);
> if (!new_arg)
> return 0; // OOM
>
>
>
>