Hello Guilhem,
On 05/27/2011 05:41 PM, Guilhem Bichot wrote:
> Hello,
>
> I deleted all "ok" points where I have nothing to add.
>
> Gleb Shchepa a écrit, Le 25.05.2011 16:21:
>> On 05/10/2011 12:31 AM, Guilhem Bichot wrote:
>
>>>> 3358 Gleb Shchepa 2011-04-01
>>>> WL#4897: Add EXPLAIN INSERT/UPDATE/DELETE
>>>> WL#4897 implements EXPLAIN command for INSERT, REPLACE and
>>>> single- and multi-table UPDATE and DELETE queries.
>
>>>> === added file 'mysql-test/r/explain_non_select.result'
>>>> +#
>>>> +# query: DELETE FROM t1 WHERE (@a:= a) ORDER BY a LIMIT 1
>>>> +# select: SELECT * FROM t1 WHERE (@a:= a) ORDER BY a LIMIT 1
>>>> +#
>>>> +EXPLAIN DELETE FROM t1 WHERE (@a:= a) ORDER BY a LIMIT 1;
>>>> +id select_type table type possible_keys key key_len
> ref rows Extra
>>>> +1 SIMPLE t1 ALL PRIMARY PRIMARY 4 NULL 4
> Using where
>>>> +EXPLAIN EXTENDED DELETE FROM t1 WHERE (@a:= a) ORDER BY a LIMIT 1;
>>>> +id select_type table type possible_keys key key_len
> ref rows filtered Extra
>>>> +1 SIMPLE t1 ALL PRIMARY PRIMARY 4 NULL 4
> 100.00 Using where
>>>> +EXPLAIN EXTENDED SELECT * FROM t1 WHERE (@a:= a) ORDER BY a LIMIT 1;
>>>> +id select_type table type possible_keys key key_len
> ref rows filtered Extra
>>>> +1 SIMPLE t1 index NULL PRIMARY 4 NULL 1
> 400.00 Using where; Using index
>>>
>>> GB10 400.0 is surprising.
>>> The manual says "The filtered column indicates an estimated percentage of
> table rows that will be filtered by the table condition", we could imagine it should be
> <=100. I never saw values greater than 100 until now.
>>> This isn't due to your patch, but I suggest filing a bug report. There's
>>> another case below (search for 352040 below).
>>
>> Will report. I can't find an existent bug report for this, but I remember a
> similar one (probably just a déjà vu).
>
> as you said, it's BUG#34124.
Yep.
>
>>>> +CREATE TABLE t1 (i INT, j INT);
>>>> +INSERT INTO t1 VALUES (1, 1), (2, 2), (3, 3), (4, 4), (5, 5);
>>>> +#
>>>> +# query: DELETE FROM t1
>>>> +# select: SELECT * FROM t1
>>>> +#
>>>> +EXPLAIN DELETE FROM t1;
>>>> +id select_type table type possible_keys key key_len
> ref rows Extra
>>>> +1 SIMPLE NULL NULL NULL NULL NULL NULL 5
> Using delete_all_rows
>>>
>>> GB12 it would be good to test with an innodb table too, to show that
>>> EXPLAIN properly reports that innodb doesn't use "delete_all_rows". See
>>> also my comment about sql_delete.cc.
>>
>> Done. Also I'm thinking about one large include file and two small test files,
> one for MyISAM and one for InnoDB.
>
> good idea
Done.
>
>>>> +#
>>>> +# query: DELETE FROM t2 WHERE b = 10 ORDER BY a, c LIMIT 5
>>>> +# select: SELECT * FROM t2 WHERE b = 10 ORDER BY a, c LIMIT 5
>>>> +#
>>>> +EXPLAIN DELETE FROM t2 WHERE b = 10 ORDER BY a, c LIMIT 5;
>>>> +id select_type table type possible_keys key key_len
> ref rows Extra
>>>> +1 SIMPLE t2 ALL a NULL NULL NULL 26 Using
> where; Using filesort
>>>> +EXPLAIN EXTENDED DELETE FROM t2 WHERE b = 10 ORDER BY a, c LIMIT 5;
>>>> +id select_type table type possible_keys key key_len
> ref rows filtered Extra
>>>> +1 SIMPLE t2 ALL a NULL NULL NULL 26 100.00
> Using where; Using filesort
>>>> +Warnings:
>>>> +Warning 1713 Cannot use range access on index 'a' due to type or
> collation conversion on field 'b'
>>>> +EXPLAIN EXTENDED SELECT * FROM t2 WHERE b = 10 ORDER BY a, c LIMIT 5;
>>>> +id select_type table type possible_keys key key_len
> ref rows filtered Extra
>>>> +1 SIMPLE t2 ALL NULL NULL NULL NULL 26 100.00
> Using where; Using filesort
>>>> +Warnings:
>>>> +Note 1003 select `test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS
> `b`,`test`.`t2`.`c` AS `c`,`test`.`t2`.`d` AS `d` from `test`.`t2` where (`test`.`t2`.`b`
> = 10) order by `test`.`t2`.`a`,`test`.`t2`.`c` limit 5
>>>
>>> GB14 here DELETE has warning 1713 but SELECT doesn't. It's understandable, as
> DELETE considered "a" (in "possible_keys"), then realized it was not usable; whereas
> SELECT didn't consider "a" so didn't even come to seeing the type/collation problem.
> However I don't know why DELETE considered index "a" and SELECT didn't...
>>
>> I don't have a detailed answer for you ATM. Will answer later. Single-table
> DELETE implementation differs from SELECT implementation significantly.
>> I think that SELECT filters this index out before the "quick select" probe.
>
> ok. don't bother researching this, this is probably not related to your code
It seems like so.
>
>>>> === added file 'sql/opt_explain.cc'
>>>> --- a/sql/opt_explain.cc 1970-01-01 00:00:00 +0000
>>>> +++ b/sql/opt_explain.cc 2011-04-01 13:53:57 +0000
>
>>>> +/**
>>>> + A base for all Explain_* classes
>>>> +
>>>> + This class hierarchy is a successor of the old select_describe()
> function
>>>> + implementation. It extends old select_describe() functionality to deal
> with
>>>> + single-table data-modifying commands (UPDATE and DELETE).
>
>>> GB16 in the final push, I guess you will delete references to old code
>>> (like "old select_describe()"). Those mentions have helped me greatly to
>>> understand the transition between trunk and your tree. But once your
>>> tree is trunk, they will not be needed anymore...?
>>
>> Ok... (Actually it was added not for review only but for easier code browsing in
> the future.)
>
> Ah, indeed, so it was useful. Then you can just leave the text above ("This class
> hierarchy is a successor of the old select_describe() function of 5.5"), and delete other
> mentions of old code; that should be sufficient for the code readers of the future.
Ok.
>
>>>> +*/
>>>> +
>>>> +class Explain
>>>> +{
>>>> +private:
>>>> + List<Item> items; ///< item list to feed
> select_result::send_data()
>>>> + Item_null *nil; ///< pre-allocated NULL item to fill empty columns
> in EXPLAIN
>>>> +
>>>> +protected:
>>>> + /*
>>>> + Next "col_*" fields are intended for the filling by "explain_*()"
> methods.
>>>> + Then the make_list() method links these Items into "items" list.
>>>> +
>>>> + NOTE: NULL value means that Item_null object will be pushed into
> "items"
>>>> + list instead.
>>>> + */
>>>> + Item_uint *col_id; ///< "id" column: seq. number of SELECT
> withing the query
>>>
>>> GB17 protected member variables (not member functions) are discouraged in the
> two C++ books I have (Stroustrup, Meyers). Stroustrup: "declaring data members protected
> is usually a design error".
>>> But I don't have yet a good alternative to suggest. Maybe I will find one.
>>
>> If you see a more elegant way, you are welcome.
>
> I don't see a better way. I think we are in a case where things are unlikely to go
> wrong. I.e. I agree with the guy here:
> http://www.parashift.com/c++-faq-lite/basics-of-inheritance.html#faq-19.8
Ok.
>
>>>
>>>> + Item_string *col_select_type; ///< "select_type" column
>>>> + Item_string *col_table_name; ///< "table" to which the row of
>
>>>> + return TRUE;
>>>> + if (thd->send_explain_fields(result) || prepare(result))
>>>
>>> GB41 it's not related to your patch, but I wonder why
>>> send_explain_fields() is a member of THD and not a free function; it
>>> uses only little of THD... Sending fields for a particular command is
>>> not the job of THD, at least there isn't a similar member function for
>>> the other SQL commands (or I missed it).
>>
>> I wonder too.
>
> It's not mandatory to address this in this WL.
Ok.
>
>>>> + {
>>>> + delete result;
>>>> + return TRUE;
>>>> + }
>>>> + }
>>>> + else
>>>> + {
>>>> + result= external_result;
>>>> + external_result->reset_offset_limit_cnt();
>>>
>>> GB42 I commented out the call above and no test failed. Do you have an idea
> how to get coverage for it?
>>
>> AFAIU we need a query with multi-line EXPLAIN output and a LIMIT N clause, where
> N is shorter than the number of EXPLAIN output rows.
>
> do you think you can find a testcase? or is it too difficult? I wonder if it could be
> code which used to serve but has become useless at some unknown point of the past. As this
> reset_offset_limit_cnt() function is a bit of a hack, it would be good if we could remove
> it...
Sorry, this function resets _offset_ value (not limit).
I've added a test to explain.test, since INSERT/UPDATE/... don't support offset values.
Also I've updated the commentary before the reset_offset_limit_cnt().
>
>>>> + if (quick_type == QUICK_SELECT_I::QS_TYPE_ROR_INTERSECT &&
>>>> +
> !((QUICK_ROR_INTERSECT_SELECT*)tab->select->quick)->need_to_fetch_row)
>>>> + key_read=1;
>>>
>>> GB54 not your code, but can change "key_read" to bool, store true/false in
> it, and declare only closer to where it's used (which is an if() further down).
>>
>> Done.
>
> "key_read" can be declared further down, closer to where it's used.
Moved.
>
>>>> +bool Explain_table::explain_extra()
>>>> +{
>>>
>>> GB58 this one also looks like copy-paste of selected bits of
> Explain_join::extra(), is it avoidable?
>>
>> Ok... But in this case some old regression tests will fail on reordered "extra"
> output.
>
> I think it's an acceptable behaviour change. Maybe worth telling the docs team, so
> that they put a note in the changelog?
Agree.
>
>>>> + This function forms the 1st row of the QEP output with a simple text
> message.
>>>> + This is useful to explain such trivial cases as "No tables used" etc.
>>>> + + NOTE: Also this function explains the rest of QEP (subqueries or
> joined
>>>> + tables if any).
>>>
>>> GB65 NOTE -> @note (same remark for all NOTE in the patch).
>>
>> Ok.
>
> remains just one NOTE->@note change to do in the doxygen comment of Explain_msg
Fixed.
>
>>>> + explain_send explain(result);
>>>> + bool res= thd->send_explain_fields(&explain) ||
>>>> + mysql_explain_union(thd, &thd->lex->unit,
> &explain) ||
>>>> + thd->is_error();
>>>> + if (res)
>>>> + explain.abort_result_set();
>>>> + else
>>>> + explain.send_eof();
>>>> + return res;
>>>
>>> GB75 this code above resembles what is done in execute_sqlcom_select() (if
> lex->describe is true), which is:
>>>
>>> thd->send_explain_fields(result);
>>> res= mysql_explain_union(thd, &thd->lex->unit, result);
>>> ...
>>> if (res)
>>> result->abort_result_set();
>>> else
>>> result->send_eof();
>>> Can we make execute_sqlcom_select() call explain_data_modification(), to
>>> avoid duplication?
>>> The creation of the warning (of EXPLAIN EXTENDED) can also move into
> explain_data_modification().
>>
>> Will do with the next commit (need to finish thd->lex->unit.print()
> implementation).
>
> ok
Sorry, not finished yet (probably will be a next WL).
>
>>>> === added file 'sql/opt_explain.h'
>>>> --- a/sql/opt_explain.h 1970-01-01 00:00:00 +0000
>>>> +++ b/sql/opt_explain.h 2011-04-01 13:53:57 +0000
>>>> +bool msg_describe(JOIN *join, const char *message);
>>>> +bool msg_describe(const char *message, ha_rows rows= HA_POS_ERROR);
>>>> +bool table_describe(TABLE *table, SQL_SELECT *select, uint key, ha_rows
> limit,
>>>> + bool need_sort);
>>>> +bool select_describe(JOIN *join, bool need_tmp_table, bool need_order,
>>>> + bool distinct);
>>>> +bool explain_data_modification(select_result_interceptor *result);
>>>
>>> GB78 some "const" can be added over the code; patch is attached.
>
> Did you see the patch which was attached? It proposed to add a few "const".
Thanks, have added some additional consts.
>
>>>> === modified file 'sql/sql_delete.cc'
>>>> --- a/sql/sql_delete.cc 2011-03-17 17:39:31 +0000
>>>> +++ b/sql/sql_delete.cc 2011-04-01 13:53:57 +0000
>
>>>> @@ -219,23 +247,34 @@ bool mysql_delete(THD *thd, TABLE_LIST *
>>>> DBUG_RETURN(TRUE);
>>>> }
>>>> }
>>>> +
>>>> + if (order)
>>>> + {
>>>> + table->update_const_key_parts(conds);
>>>> + order= simple_remove_const(order, conds);
>>>> +
>>>> + usable_index= get_index_for_order(order, table, select, limit,
>>>> + &need_sort, &reverse);
>>>> + }
>>>> +
>>>> + if (thd->lex->describe)
>>>> + {
>>>> + bool err= table_describe(table, select, usable_index, limit,
> need_sort);
>>>> + delete select;
>>>> + free_underlaid_joins(thd, select_lex);
>>>> + DBUG_RETURN(err);
>>>> + }
>>>> +
>>>> if (options & OPTION_QUICK)
>>>> (void) table->file->extra(HA_EXTRA_QUICK);
>>>
>>> GB84 moving the "quick" down this way is apparently ok. Only MyISAM's
>>> "index page deleting" code makes use of this information, and that code
>>> is called only later in this function.
>>
>> Do you prefer to move it up?
>
> no, I was just explaining (to myself, and speaking loud) why your change was
> correct.
Ok.
>
>>>> === modified file 'sql/sql_insert.cc'
>>>> --- a/sql/sql_insert.cc 2011-02-15 17:14:15 +0000
>>>> +++ b/sql/sql_insert.cc 2011-04-01 13:53:57 +0000
>>>> + if (thd->lex->describe)
>>>> + {
>>>> + /*
>>>> + Obviously INSERT without the SELECT is not suitable for EXPLAIN,
> since we
>>>> + don't plan how read tables.
>>>
>>> GB85 this sentence is strange English, even for a French :-)
>>
>> Oops.
>>
>>>
>>>> + So we simply send "No tables used" and stop execution here.
>>>
>>> GB86 it might be interesting to tell the user whether the INSERT
>>> will be using bulk insert or not (this can influence speed). But this is
>>> an engine's decision (in its implementation of start_bulk_insert()), and
>>> we usually do not and cannot report in EXPLAIN what the engine decided...
>>
>> Currently I don't see a way how to show it.
>
> me neither. It's like for "DELETE FROM t;" : I woud like to tell the user whether we
> used delete_all_rows() or row-by-row delete, but this can be known only if we try the
> delete, which isn't possible for EXPLAIN DELETE...
Ok.
>
>>>> === modified file 'sql/sql_update.cc'
>>>> --- a/sql/sql_update.cc 2011-02-25 16:41:57 +0000
>>>> +++ b/sql/sql_update.cc 2011-04-01 13:53:57 +0000
>
>>> GB93 'order && (need_sort||used_key_is_modified)' is an expression
> copied from code a bit further down, could we not copy-paste but rather store that in some
> const and use it in the two places?
>>
>> Fixed. (Actually I've tried that before, but in this case we have to declare
> non-cost variable at the top of the function -- "thanks" to goto. I don't know, what is
> uglier: copy&paste of short expression or top-level declaration of locally-used
> variable).
>
> indeed... sigh...
>
>>> GB94 there are 3 situations:
>>> a) filesort
>>> b) scan table, collect row ids, update table (it's the "else" branch
>>> after filesort)
>>> c) neither a or b.
>>> In table_describe() above we show "a". Could we also show "b"? Or do we
> already?
>>
>> Currently we don't show this case. What "extra" message you suggest to use at
> this place?
>
> I see that in (b) we go into open_cached_file(). Maybe "Using temporary"?
Done.
Before the final push I'll remove rearrangements of "Using temporary" and "Using filesort"
in result files (many result files were affected, now these changes are unnecessary).
>
>>> GB97 as discussed, it would be good to have EXPLAIN EXTENDED print the
>>> transformed query as it does for SELECT.
>>
>> Not done yet. Will be a separate commit a bit later.
>
> ok
Probably it will be in some short WL in the close future. Sorry.
>
>>> GB98 testing "EXPLAIN PARTITIONS DELETE/UPDATE/...", without and with
>>> EXTENDED, would be good.
>>>
>>> GB99 it would be good to have coverage for all new places where we now
>>> send EXPLAIN output; for example the ones where prune_partitions()
>>> shortcuts DELETE/UPDATE; and others.
>>
>> Added some tests. Still working on "100%" coverage.
>
> nice goal.
Done (98.5% or so ;-)
BTW I've hacked the old good dgcov.pl tool to work with cmake! This is really great
script.
>
>>> GB100 in the WL, there is "The solution is a new explain_send class that
>>> initializes tables like INSERT/REPLACE and multi-table UPDATE/DELETE
>>> do". I suggest to change "INSERT/REPLACE" to "INSERT/REPLACE...SELECT".
>
> how about this?
Done.
>
>>> GB101 what about unit testing? I'm not sure it is easily doable, given the
> dependency on THD, Field, range optimizer, JOIN, JOIN_TAB...?
>>
>> I'm thinking about that, but I don't see a good (and profitable) way to cover
> this code with unit tests.
>
> Me neither. You could have a unit test which creates some Explain_* objects and
> checks that they return the correct data, but is that useful?
I think this is a task for regression tests, not for unit tests.
>
>>> GB102 I have to think a bit more about explain_send wrapping the interceptor,
> there may be design changes to suggest there. Also about the relationship between
> mysql_select() and explain_data_modification(). Nothing serious though, and not yet clear
> in my mind. Will keep you posted.
>
> I found nothing to post :-)
>
> Something to do later: working with docs team to document the EXPLAIN output of
> "EXPLAIN non-SELECT".
Ok.
Thank you,
Gleb.