From: Ole John Aske Date: February 28 2011 10:37am Subject: Re: bzr commit into mysql-5.1-telco-7.0 branch (ole.john.aske:4223) Bug#11804277 List-Archive: http://lists.mysql.com/commits/132095 Message-Id: <4D6B7AF9.4020702@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 02/24/11 08:10 PM, Ole John Aske wrote: > On 24.02.2011 12:59, Jonas Oreland wrote: >> On 02/24/11 12:21, Ole John Aske wrote: >>> On 23.02.2011 19:15, Jonas Oreland wrote: >>>> On 02/23/11 19:07, Ole John Aske wrote: >>>>> On 23.02.2011 18:00, Jonas Oreland wrote: >>>>>> Hi OJ, >>>>>> >>>>>> 1) What do you think about using heuristic even if we don't have row-count. >>>>>> E.g if we don't have row-count, set it to 100 and run you range-heuristic anyway >>>>> Yes, would likely be usefull in those cases. >>>> can you produce a deterministic testcase for this too...i'd be happy... >>>> but i'm also ok with skipping if it's hard >>> I got some second thoughts here: >>> >>> There are several handler methods which currently trap the 'don't have row-count' >>> state. From top of my head ::records_in_range, ::info() and ::scan_time() might be some >>> of them. >>> >>> Currently 'don't have row-count' is handled by assuming 2 rows. >>> >>> - Changing this to assume 100 rows for ::records_in_range() only >>> will likely cause situations where a full table scan is preferred >>> over a range which will be a performance regression. >>> >>> - Changing 'unknown' to be 100 rows in all places where this is trapped >>> is likely to create lots of changed query plans and EXPLAIN's in MTR tests. >>> ... I don't think we want this either..... >>> >>> So my suggestion is to don't change this and instead wait for my >>> 'persistent statistics' patch which will hopefully remove all of these >>> 'unknown row-count' situations. >> i think changing unknown from 2 to 100...sound ok...even good (at all locations of unknown) >> and can't think of any scenario in which this would result in a poorer query plan than what one >> can get with 2...can you ?::records_in_range > > I can test this and report back later After investigation and testing: Changing 'unknown' from 2 to 100 is not a simple task... The ha_ndbcluster code is not particular clean in this area: - The 'local statistics' object where 'unknown' information should have been represented is currently not maintained / used. - Instead #rows statistics is set directly into handler::stats object where #rows becomes 'known' immediately after statistics is requested updated either from datanodes or from cached 'local statistics'. It is therefore an information loss where it is not possible to distinguise between a low real #rows, and a low #rows previously estimated or 'faked' due to 'unknown' statistics. I have implemented an alternative solution where we 'update_stats()' on demand whenever we encounter 'unknown' #rows in ::records_in_range() heuristics. This is similar to existing code in ::records_in_range() which is executed when 'ndb-index-stat-enable=1' ... Furthermore, instrumenting code shows that 'unknown' never is encountered in this part of code.... > > Ole John >> /Jonas > >