Hi Sergey,
thank you for the extra information, I replied to your commit e-mail
separately.
Sergey Glukhov wrote:
> Hi,
>
> On 12/13/2010 03:26 PM, Martin Hansson wrote:
>> Sergey Glukhov skrev 2010-12-10 13.06:
>>> #At file:///home/gluh/MySQL/mysql-5.1-bugteam/ based on
>>> revid:bjorn.munch@stripped
>>>
>>> 3515 Sergey Glukhov 2010-12-10
>>> Bug#48916 Server incorrectly processing HAVING clauses with
>>> an ORDER BY clause
>>> Before sorting HAVING condition is splitted into two parts,
>>> first past is a table related condition and the rest of is
>>> HAVING part. Extraction of HAVING part does not take into
>>> account
>>> the fact that some of conditions might be non-const but
>>> have 'used_tables' == 0 (undependent subqueries)
>>> and because of that these conditions are cut off by
>>> make_cond_for_table() function.
>>> The fix is to use (table_map) 0 instead of used_tables in
>>> third argument for make_cond_for_table() function.
>>> It allows to extract elements which belong to sorted
>>> table and in addition elements which are undependend
>>> subqueries.
>> You might want to run a spell-checker on the comment. Misspelled
>> change set comments take longer to read.
>>
>> I don't understand your fix. Ok, I understand that the IN condition
>> is non-constant but still doesn't refer to any tables. But why is
>> there any difference between sending a constant 0 and a variable
>> which is 0? Actually,
> The difference is that IN subquery is uncacheable that makes it
> non-const.
Isn't it more correct to say that it is both non-const and non-cacheable
since it is transformed into a correlated EXISTS sub-query? And the fact
that it doesn't reference any tables is probably because (3,2) is a
constant expression.
By the way, why does it have to be a row-valued predicand? What about
scalar expressions, e.g. HAVING 2 IN (SELECT f2 FROM t2). I can repeat
the bug with that. But I agree that row expressions should be tested as
well.
> Check Item_in_subselect::row_value_transformer where
> we set subquery into UNCACHEABLE state and also check
> Item_subselect::fix_fields where we make this subquery
> non-const depending on UNCACHEABLE flag.
>
>> I ran some test on my machine and I get used_tables == 1. Have I
>> missed something?
> yes, lets look at this condition:
>
> (3, 2) IN (SELECT 4, 2) AND t1.f1 >= 0
>
> First part '(3, 2) IN (SELECT 4, 2)' has used_tables == 0;
> Second part 't1.f1 >= 0' has used_tables == 1;
>
> So COND1->used_tables() | COND2->used_tables() returns 1.
My mistake, I confused used_tables for used_table, heh....
>
>
>>
>>
>> I ran the test with your fix and as you're saying, the condition
>> <in_optimizer>((3,2),<exists>(select 4,2 having (((<cache>(3) =
> 4) or
>> isnull(4)) and ((<cache>(2) = 2) or isnull(2)) and
>> <is_not_null_test>(4) and <is_not_null_test>(2)))) goes missing
>> without your fix, but why?
> Because make_cond_for_table() with third argument not equal to zero
> cuts it off. Probably Roy's comment which is added into
> opt_backporting tree could help to understand the function behaviour:
That comment is very good, but this one is even better:
/*
Ignore this condition if
1. We are extracting conditions for a specific table, and
2. that table is not referenced by the condition, and
3. [...]
*/
if (used_table && // 1
!(cond->used_tables() & used_table) && // 2
...
It probably explains the whole bug, huh?
>
>> /**
>> Extract a condition that can be checked after reading given table
>>
>> @param cond Condition to analyze
>> @param tables Tables for which "current field values" are
>> available
>> @param used_table Table that we're extracting the condition for (may
>> also include PSEUDO_TABLE_BITS, and may be zero)
>> @param exclude_expensive_cond Do not push expensive conditions
>>
>> @retval <>NULL Generated condition
>> @retval = NULL Already checked, OR error
>>
>> @details
>> Extract the condition that can be checked after reading the table
>> specified in 'used_table', given that current-field values for
>> tables
>> specified in 'tables' bitmap are available.
>> If 'used_table' is 0, extract conditions for all tables in 'tables'.
> ^^^
> I would like to add the info that If 'used_table' is 0 the function
> also returns conditions which have used_tables() == 0.
Why not describe it on a higher level? No need to repeat what the code
says. If used_table is 0 <=> we don't extract conditions for any
particular table <=> even conditions that don't apply to one particular
table are included in the produced condition.
I also think you should back-port all applicable comments to 5.1. It
seems it would have saved you trouble had they been there when you
started out...
Best Regards
Martin