On 02/24/11 08:10 PM, Ole John Aske wrote:
> On 24.02.2011 12:59, Jonas Oreland wrote:
>> On 02/24/11 12:21, Ole John Aske wrote:
>>> On 23.02.2011 19:15, Jonas Oreland wrote:
>>>> On 02/23/11 19:07, Ole John Aske wrote:
>>>>> On 23.02.2011 18:00, Jonas Oreland wrote:
>>>>>> Hi OJ,
>>>>>> 1) What do you think about using heuristic even if we don't have
>>>>>> E.g if we don't have row-count, set it to 100 and run you
> range-heuristic anyway
>>>>> Yes, would likely be usefull in those cases.
>>>> can you produce a deterministic testcase for this too...i'd be happy...
>>>> but i'm also ok with skipping if it's hard
>>> I got some second thoughts here:
>>> There are several handler methods which currently trap the 'don't have
>>> state. From top of my head ::records_in_range, ::info() and ::scan_time()
> might be some
>>> of them.
>>> Currently 'don't have row-count' is handled by assuming 2 rows.
>>> - Changing this to assume 100 rows for ::records_in_range() only
>>> will likely cause situations where a full table scan is preferred
>>> over a range which will be a performance regression.
>>> - Changing 'unknown' to be 100 rows in all places where this is trapped
>>> is likely to create lots of changed query plans and EXPLAIN's in MTR tests.
>>> ... I don't think we want this either.....
>>> So my suggestion is to don't change this and instead wait for my
>>> 'persistent statistics' patch which will hopefully remove all of these
>>> 'unknown row-count' situations.
>> i think changing unknown from 2 to 100...sound ok...even good (at all locations
> of unknown)
>> and can't think of any scenario in which this would result in a poorer query plan
> than what one
>> can get with 2...can you ?::records_in_range
> I can test this and report back later
After investigation and testing:
Changing 'unknown' from 2 to 100 is not a simple task...
The ha_ndbcluster code is not particular clean in this area:
- The 'local statistics' object where 'unknown' information should have been represented
is currently not maintained / used.
- Instead #rows statistics is set directly into handler::stats object where #rows
becomes 'known' immediately after statistics is requested updated either from
datanodes or from cached 'local statistics'.
It is therefore an information loss where it is not possible to distinguise between a
low real #rows, and a low #rows previously estimated or 'faked' due to 'unknown'
I have implemented an alternative solution where we 'update_stats()' on demand whenever
we encounter 'unknown' #rows in ::records_in_range() heuristics. This is similar to
existing code in ::records_in_range() which is executed when 'ndb-index-stat-enable=1'
... Furthermore, instrumenting code shows that 'unknown' never is encountered in
this part of code....
> Ole John