From: Ole John Aske Date: February 28 2011 11:38am Subject: bzr push into mysql-5.1-telco-7.0 branch (ole.john.aske:4234 to 4235) List-Archive: http://lists.mysql.com/commits/132103 Message-Id: <20110228113840.BA0DD223@fimafeng09.norway.sun.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit 4235 Ole John Aske 2011-02-28 Fixed compiler warnings introduced by last push. modified: sql/ha_ndbcluster.cc 4234 Ole John Aske 2011-02-28 Fix for bug#11804277 - INCORRECT INDEX MAY BE SELECTED DUE TO INSUFFICIENT STATISTICS FROM CLUSTER Add heuristics to ha_ndbcluster::records_in_range() which identifies a range as: - An open bound range ( LT/GT BETWEEN AND ) - A (partial) EQ-range ( EQ ) ... Or a combination of these.... These are handled as follows: Open bound ranges ----------------- Without a histogram of how the values in the index are distributed, we can only assume an equal distrubution. A statistically correct estimate for a condition of the form ' LT/GT ' would then have been to assume it selects 50% of the rows in the table. However, I have experienced that this will cause the range-cost to directly compete with the cost of a full table scan. We should therefore be somewhat more conservative and estimate 10% of the rows to be returned. Closed bound range ------------------ We assume this to be somewhat better than an open bounded range returning 5% of the rows in the table. EQ-range -------- An EQ-range will excatly specify a fraction of the first part of an index. It is reasonable to assume: - Specifing a larger fraction of the index will improve the selectivity of the EQ-range. - Each part of the specified EQ-range will have the same selectivity. We can model this as a Binomial Distribution of the indexed values. http://en.wikipedia.org/wiki/Binomial_distrib modified: mysql-test/suite/ndb/r/ndb_condition_pushdown.result mysql-test/suite/ndb/r/ndb_index.result mysql-test/suite/ndb/r/ndb_index_unique.result mysql-test/suite/ndb/r/ndb_read_multi_range.result mysql-test/suite/ndb/r/ndb_statistics.result mysql-test/suite/ndb/t/ndb_statistics.test sql/ha_ndbcluster.cc === modified file 'sql/ha_ndbcluster.cc' --- a/sql/ha_ndbcluster.cc 2011-02-28 10:42:04 +0000 +++ b/sql/ha_ndbcluster.cc 2011-02-28 11:38:19 +0000 @@ -11256,7 +11256,7 @@ ha_ndbcluster::records_in_range(uint inx if (eq_fraction >= 1.0) // Exact match -> 1 row DBUG_RETURN(1); - rows = (Uint64)(table_rows / pow(table_rows, eq_fraction)); + rows = (Uint64)((double)table_rows / pow(table_rows, eq_fraction)); if (rows > (table_rows/50)) // EQ-range: Max 2% of rows rows= (table_rows/50); No bundle (reason: useless for push emails).