Thank you Timour for explaining this to me. I am sorry that I not only
managed to confuse myself, but also you by talking about sub-query
materialization when what I really meant was semi-join materialization.
Given bug#48213, it seems that semi-join materialization does not
take the necessary actions to prevent materialization from being used
with blob columns.
It still confuses me a bit that temporary tables with blobs are
created in the first place. Why does create_tmp_table() need to
handle blob columns? Are there scenarios where such tables are of use
even if you are not able to perform look-ups on them?
--
Øystein
Timour Katchaounov wrote:
> Øystein,
>
>> Hi,
>>
>> Is sub-query materialization supposed to work in the context of BLOB
>> columns? Given the following query (ref. Bug#48213) where column e is
>
> No, it is not supposed to work, and this is clearly encoded and commented
> in the implementation of subquery materialization. There are also
explicit
> tests for this.
>
> In item_subselect.cc, method subselect_hash_sj_engine::init_permanent
>
> there is a comment:
>
> /*
> If the subquery has blobs, or the total key lenght is bigger
than some
> length, then the created index cannot be used for lookups and we
> can't use hash semi join. If this is the case, delete the temporary
> table since it will not be used, and tell the caller we failed to
> initialize the engine.
> */
>
> In addition, in the regression test
> subselect_mat.test
> there is a set of tests that make sure that we do *not* use
materialization
> in the presence of blobs and long text columns. The tests follow
after the
> comment:
>
> /*
> Test that BLOBs are not materialized (except when arguments of some
> functions).
> */
>
> The reason for this limitation is that unique indexes created on BLOB
> and long
> (AFAIR > 512 bytes) columns can be used only to ensure uniqueness, but
> cannot
> be used for lookup of actual values.
>
> As I was told by Monty and Serg, this can be fixed, but my goal was to
> implement
> materialization of subqueries, not to get stuck on some handler code,
so I
> preferred to accept this limitation.
>
>> BLOB and the rest is VARCHAR:
>>
>> SELECT pk FROM t1 WHERE (b,c,d,e)
>> IN (SELECT b,c,d,e FROM t2 WHERE pk > 0);
>>
>> Should it be possible to use materialization to evaluate this query?
>
> No, as is obvious from the code, and the comments in the code.
>
>> create_tmp_table(), which is used to create the materialized table,
>> contains the following comment:
>>
>> /*
>> Special mode for index creation in MyISAM used to support unique
>> indexes on blobs with arbitrary length. Such indexes cannot be
>> used for lookups.
>> */
>>
>> If this index cannot be used for lookup, how can it be of any use for
>> materialization?
>
> Exactly, it cannot, and it isn't.
>
>>
>> I am a bit confused,
>>
>
> I don't understand why you were confused. The behavior is pretty
> consistent,
> and the comments pretty explicit. There are even tests for this behavior.
>
>
> Timour
>