From: Roy Lyseng Date: September 27 2010 11:32am Subject: WL5274 - Architecture review List-Archive: http://lists.mysql.com/commits/119162 Message-Id: <4CA080E4.8090203@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi Evgen, thanks for the revised specification! I think that we are approaching approval of the architecture review for WL5274. Apart from some small clarifications, the major disagreement point now is LooseScan. The whole text is quoted, and comments are prefixed with '>>'. HLD --- Postpone materialization of views/subqueries in the FROM clause till execution phase. This will allow fast EXPLAINs for said views/subqueries. Late materialization will add an index chosen by the range optimizer to the result table. HLS --- Goals: ===== 1. Avoid materialization when possible 2. Speedup execution of queries by adding indexes to derived tables 2.1 Allowing ref access to derived tables 2.2 Allowing loose index scan over a derived table 1. Delay materialization until it is needed ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Consider following cases: Derived tables are materialized for EXPLAINS queries but the result dataset - EXPLAIN - isn't used since the query itself isn't executed. Examples: EXPLAIN SELECT * FROM (SELECT * FROM a_big_table); EXPLAIN SELECT * FROM a_materializable_view; Materialization is useless when a derived table is used in a join and the partial query plan prior to the derived table produced empty result. In this case the query will return empty result and materialized data wouldn't be used at all. Examples: table t1 is (int f1, int f2) and contains values (0,0),(0,1) SELECT * FROM t1 JOIN (SELECT big.f1 AS b1 from a big_table AS BIG) ON t1.f2=b1 WHERE t1.f1 > 0; SELECT * FROM a_materializable_view AS view JOIN t1 ON t1.f2= view.v1 WHERE t1.f1 > 0; These cases could be optimized by moving derived tables materialization to the point when the result is actually needed, i.e. right before the materialized table have to be read. In the worst case (i.e. derived tables were materialized)) query execution will take the same time as before since no additional work is done. In the best case (derived tables weren't materialized) the execution will take time shorter by the time needed to materialize derived table(s). 2. Speedup execution of queries adding indexes to derived tables ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2.1 Allowing ref access to derived tables ----------------------------------------- Ref access could greatly reduce amount of data that is needed to join a table. It is used in queries like: SELECT * FROM t1 JOIN (SELECT * FROM t2) tt ON t1.f1=tt.f1; SELECT * FROM t1 JOIN a_materialized_view AS view ON t1.f1=view.f1; In these examples adding index over field f1 from tt/view will bring it into consideration by range optimizer and in case of lowest cost ref access will be used. Appropriate indexes could be added by taking following steps: * Create fake key description for the range optimizer: >> I prefer the term "possible key" - which you use later - over "fake key" For each derived table DT: * For each table (or derived table) T which is joined to DT find set of fields from DT that are used to join table T * For each found set create a description of the key that consists of all fields from the set and each field is included only once * Range optimizer will choose (or wouldn't, depending on cost) fake keys over derived tables to use for ref access. * After query execution plan is chosen, create one real key and drop the rest: For each derived table DT * Actually create the key chosen by the range optimizer * Drop other key descriptions for DT (drop all descriptions if ref access isn't used for DT) * While derived table being materialized and thus filling the result table the storage engine will create index according to saved key description. After these steps we will have a materialized derived table with generated index which is exactly the same as a usual table with index, so execution will go flawlessly. Since ref access is chosen among other access methods due to lower cost the query will be always executed faster than without ref access. If ref access have higher cost than some other access method then no real keys would be created, thus in the worst case we loose nothing. The only overhead is the creating of the fake key descriptions and overhead is negligible if compared to cost of execution a whole query. In addition to key descibe above we could create a composite key which would contain all fields referenced in the derived table. Such key could be used in several cases: .) full key is used when the derive table is placed last in the join .) a prefix of this key could be used in cases where derived table isn't the last one. >> Please add a note saying that this is for future, and not part of this WL 2.2 Allowing loose index scan over a derived table -------------------------------------------------- >> Please add in a reference to WL#1724 and/or WL#3220 (if we keep this part) There is a class of queries with GROUP BY which can be greatly speed up by using loose index scan because it allows to scan only a fraction of the table instead of doing full scan. Queries which could benefit are like following: SELECT MIN(f1) FROM (SELECT ....) tt; SELECT f1, MAX(f2) FROM (SELECT ... UNION SELECT ...) tt GROUP BY 1; SELECT DISTINCT * FROM a_materializable_view; >> I am questioning the usefulness of loose index scan for derived tables >> materialization. >> Imagine the query: >> SELECT MIN(f1) FROM (SELECT a+b AS f1 FROM t1 JOIN t2 ...) >> A more efficient way would be to rewrite this into: >> SELECT MIN(a+b) AS f1 FROM t1 JOIN t2 ... >> >> Another example: >> SELECT f1, MAX(f2) FROM (SELECT k, a+b FROM t1 >> UNION SELECT k, c+d FROM t2) AS tt(f1, f2) >> GROUP BY f1; >> >> which could be rewritten as: >> >> SELECT f1, MAX(f2) FROM (SELECT k, MAX(a+b) FROM t1 GROUP BY k >> UNION SELECT k, MAX(c+d) FROM t2 GROUP BY k) AS tt(f1, f2) >> GROUP BY f1; >> >> (Here, we cannot eliminate the outer grouping because UNION may give us >> groups with more than one row). >> >> My point is that doing the materialization of the derived query is probably >> much more costly than the speed gain you achieve by doing skip-scan in >> the materialized index. If we do this optimization, it would be obsolete >> when a query rewrite facility is available. >> >> Another problem is that it materializes the table during optimization, >> which IMHO is a technique that we should try to avoid. i.e. a query to be the subject of the loose index scan optimization should have only one table, have GROUP BY clause or employ aggregate functions (only MIN/MAX are allowed) or have the DISTINCT clause. >> Is this the exact specification of requirement for loosescan? If those requirements are met we could generate a covering index over the result of the derived table to allow this optimization. Loose index scan applicability check consists of two parts. First part checks semantic, like presence of GROUP BY clause, whether there are aggregate functions and if they are of acceptable type and so on. If this part succeeds the second part checks the cost of the loose index scan usage. For a derived table we can check only the first part because a key could be created only while result table creation and the cost check relies on the statistics provided by the actual key. If the first check succeeds for the derived table then the key that covers all fields of the derived table result is created and derived table is materialized. After that the second part is checked. However, the derived table could fail the second check. In the worst case (i.e. optimizer chose not to use loose index scan) the query execution will be slower by the time needed to create covering index over the derived table result. In the best case the query execution will be faster, but how much faster depends on the actual data. LLD --- 1. Avoid materialization when possible ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Extending the derived tables handling routines ------------------------------------------ Derived table handling consists of four steps - prepare, optimize, create result table, execute (materialize). Prior to this work, this was done by 2 functions: mysql_derived_prepare handles preparation step and creates the result table. mysql_derived_materialize handles the rest. Now each step has its own function: mysql_derived_prepare - runs prepare step on a given derived table. mysql_derived_optimize - run optimization step on a given derived table. mysql_derived_create - create result table for a given derived table. mysql_derived_materialize - materialize a given derived table. Derived tables handling steps ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Prepare ------- This is the only derived tables handling step left at the open_table function. Although physically the table is created at the 'create' step the TABLE structure of the result table is instantiated here for the name resolution purposes. Difference is that storage engine isn't called to create actual files. This allows us to add keys later, if needed. Optimize -------- The new addition to the optimization step is the obtaining and saving the estimated number of rows of the derived table since the derived table aren't early materialized anymore and exact row count is unknown for them. This statistics is needed for the optimization of the outer select. The estimated number of rows is calculated during picking the best join order and kept in the JOIN::best_rowcount variable, similar to JOIN::best_read. Also make_join_statistics fills JOIN::best_rowcount in some corner cases. The estimated number of rows is calculated according to following rules: .) If there is no tables then select could produce 1 row at most, thus best_rowcount is set to 1. .) If select's join is expected to produce only 1 row then the best_rowcount is set to 2. This needed to avoid marking the derived table as a const due to imprecise estimated value. >> I would be more happy with a "const" flag and an estimate that is either >> the exact number (if const) or an estimate (if !const), and no such >> "if 1 then" and "if 2 then" stories. >> But I will allow it if you insist :) .) Otherwise best_rowcount is set to number of rows expected from select's join. The st_select_lex_unit::optimize function sums JOIN::best_rowcount values from all underlying joins and saves it to result->estimated_records variable which is used by the upper select as a source of estimated number of records. When subquery consists of only one select the estimate is saved by the mysql_derived_optimize function. >> Why do you need specialized code for this? st_select_lex_unit::optimize() >> works also for a singleton query specification. Optimization of derived tables are done bottom->up. Pair of handle_derived methods is used for this, both methods are supplied with a processor which is called upon appropriate derived tables. SELECT_LEX::handle_derived method calls TABLE_LIST::handle_derived on all derived tables in the select. In it's turn TABLE_LIST::handle_derived calls SELECT_LEX::handle_derived for all underlying selects if the derived table is a mergeable view or calls given processor on self if the derived table is materializable. JOIN::optimize outer query 1 v ^ 4 SELECT_LEX::handle_derived(&mysql_derived_optimize) 1 v ^ 4 +<--TABLE_LIST::handle_derived | ^ 1 | 2 ^ 3 v ^ 4 | | | | mysql_handle_single_derived | | | | 3 | ^ 4 | | v | v | derived table's | | SELECT_LEX::handle_derived mysql_derived_optimize subquery | | | ^ v ^ | +-----+ | JOIN::optimize +------------+ | ^ ^^^^^^^^^^^^^^^^^^^^^^ v | Optimize all underlying make_join_statistics + choose_plan derived tables before optimizing given derived ^^^^^^^^^^^^^^^^^^^^^^ table itself. Calculate and save estimate For optimization step the mysql_derived_optimize processor is used. It does two things: .) calls to the st_select_lex_unit::optimize function (which calls JOIN::optimize on all selects in the union) .) materializes derived tables with 0 or 1 rows. This is needed because optimizer tries to pre-read such tables before execution. >> It may be better to say "materializes derived tables that are const". >> Tables that have 0 or 1 rows is a special case that applies to MyISAM >> tables only. You also have grouped queries that are known to be 0 or 1 rows, >> but where the underlying data is non-const. Finally you have queries with >> ref access and const values that guarantee to return 0 or 1 rows. After optimization of the derived table is done the estimate could be obtained via the TABLE_LIST::fetch_number_of_rows function. Join optimizer uses table->file->stats.records to get number of table's rows so in order to simplify implementation TABLE_LIST::fetch_number_of_rows saves estimate into this field. Create result table ------------------- The reason for the separate create step is to allow late key creation. But we need instantiated TABLE structure for name resolution purposes, thus only physical table creation is postponed. To do this the select_union::create_result_table function passes additional create_table flag via TMP_TABLE_PARAM structure to the create_tmp_table function. When this flag is FALSE create_tmp_table wouldn't call handler::create to actually create table thus just instantiating the valid TABLE structure. To tell whether the table were actually created the TABLE::created is added. To physically create a result table mysql_derived_create calls handler::create on the given derived table and marks the table as created. For SELECT queries this is done at the beginning of JOIN::exec for each derived table in the same way as for the optimization step, but with mysql_derived_create as the processor. Exceptions are: .) For UPDATE queries for correct error handling derived tables have to be created right after preparation. .) For SHOW COLUMNS derived tables have to be created before call to the mysql_list_fields function. Materialize ----------- Materialized derived table is handled in the same way as a usual table with the only exception that it has to be materialized prior to reading. This is done by the join_materialize_table function. It's a helper function and it just calls the mysql_derived_materialize function to materialize the derived table and restores the read_first_record function. After the join order and access methods has chosen, make_join_readinfo saves >> are chosen chosen read_first_record function of a materialized derived table and substitutes it for the join_materialize_table function. Beside that, a derived table could be materialized in the create_sort_index and add_group_and_distinct_keys functions. Materialization in create_sort_index is needed to correctly handle the case of queries like : SELECT ... FROM (subselect) ORDER BY ... >> Please add: Materialization in create_sort_index() may not be needed when >> it becomes possible to merge a derived query with an outer query. On the add_group_and_distinct_keys see below. After the derived table is materialized the newly added flag TABLE_LIST::materialized is set to TRUE. 2. Speedup execution of queries by adding indexes to derived tables ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2.1 Allowing ref access to derived tables ----------------------------------------- Allowing ref access to a derived table is done in 3 steps: a) Gather info on which derived table's fields are used to join this table. b) Generate keys descriptions so optimizer could take them into account when choosing access method. >> access method -> join order and access method c) Drop unused keys descriptions after access method was chosen. >> after access method -> after optimization In more details: a) A good place to gather such info is the add_key_field function. Range optimizer uses this function to gather all keys that could be used in the query. Prior to checking index on the given field add_key_field calls TABLE_LIST::update_derived_keys to update list of possible keys for the derived table. This way derived table's are taken into account. Info on possible keys is maintained on the per-TABLE_LIST base. Each possible key is represented by the DERIVED_KEY_MAP object. It holds info on which fields >> By our naming standard, this class should be called Derived_key_map (or >> perhaps just Derived_key - the map is one component of the class). the possible key consists of and the table map of the table which needs this key. Set of such objects are stored in the TABLE_LIST::derived_keymap_list list >> Personal preference: Do not indicate implementation details in field names, >> a list of derived keys may be called just "derived_keys" instead of >> "derived_key_list". No change needed if you decide to replace the list with >> another structure. and index of the object in the list represents the number of the key. To keep track on keys the TABLE::max_keys variable is added. It is increased when the new TABLE_LIST::derived_keymap_list entry is added. >> Why do you maintain number of keys in TABLE object, while the derived keys >> are collected within TABLE_LIST? And, can you not just query the list for >> element count? The TABLE_LIST::update_derived_keys function accepts a field from the derived table and a set of values (actually fields of other tables) for given field. >> set of names of fields from outer tables? For each value it searches appropriate possible key and updates it to include given field. This field's part_of_key bitmap is also updated accordingly. When there is no entry for a value then a new entry is created. In this case the field's key_start bitmap is updated also. >> Can you explain what key_start_bitmap is? This list is generated at the first execution and kept for all subsequent runs. b) This is enough for range optimizer only to start considering derived table's possible keys. To choose the access method optimizer needs keys descriptions. They are generated by the helper function - TABLE_LIST::generate_derived_keys. update_ref_and_keys function calls it after info on all possible keys has been gathered. It allocates memory for keys descriptions with help of the TABLE::alloc_keys function and then iterates over list of tables and calls TABLE_LIST::generate_keys for each materialized derived one. In it's turn the >> materialized derived one -> materialized derived table? latter function iterates over derived_keymap_list list and for each object calls TABLE::add_tmp_key to actually create key description. For each execution a new result table is created. Thus key descriptions are also re-created using the list of possible keys created at the first run. c) After make_join_statistics has finished (i.e range optimizer has chosen access methods for all tables) derived tables' possible keys that weren't chosen should be dropped to not waste resources. This task is done by the JOIN::drop_unused_derived_keys function. It iterates over table list and for materialized derived ones calls TABLE::use_index with index number chosen by the range optimizer. TABLE::use_index saves given index as index #0 and drops the rest. '-1' given as the index number means no index was chosen and all of them has to be dropped. Eventually, JOIN::exec will be called, it will call mysql_derived_create which will actually create result tables with chosen key. Thus, when subquery execution code will fill table the chosen index will be automatically created by the table engine. The memory allocated for DERIVED_KEY_MAP objects and possible keys descriptions is freed at the TABLE_LIST::cleanup. 2.2 Allowing loose index scan over a derived table -------------------------------------------------- This task is done in the add_group_and_distinct_keys function. It does partial check whether loose index scan is applicable, whether the result table doesn't contain blob fields and if so it creates appropriate covering index. The index on the derived table's result table is created by the TABLE::add_tmp_key function. After this the derived table is materialized with help of the mysql_derived_create and mysql_derived_materialize functions. From this point the derived table is the same as a usual table with an appropriate index. The get_best_group_min_max function does necessary cost evaluation and if it passes well the loose index scan access method is chosen for the derived table. Because loose index scan is essentially an index scan it could has much bigger scan time compared to table scan, so to avoid performance regression actual info on key selectivity have to be used. For obtaining it we have to materialize the derived table. The disadvantage of this optimization is that EXPLAIN will materialize such derived table and that could result in a long-running EXPLAIN. Thanks, Roy