List:Commits« Previous MessageNext Message »
From:Roy Lyseng Date:September 27 2010 11:32am
Subject:WL5274 - Architecture review
View as plain text  
Hi Evgen,

thanks for the revised specification!

I think that we are approaching approval of the architecture review for WL5274.
Apart from some small clarifications, the major disagreement point now is LooseScan.

The whole text is quoted, and comments are prefixed with '>>'.

HLD
---
Postpone materialization of views/subqueries in the FROM clause till execution
phase.
This will allow fast EXPLAINs for said views/subqueries.
Late materialization will add an index chosen by the range optimizer to the
result table.

HLS
---

Goals:
=====

1. Avoid materialization when possible
2. Speedup execution of queries by adding indexes to derived tables
    2.1 Allowing ref access to derived tables
    2.2 Allowing loose index scan over a derived table

1. Delay materialization until it is needed
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Consider following cases:
Derived tables are materialized for EXPLAINS queries but the result dataset

-
EXPLAIN
-

isn't used since the query itself isn't executed.
Examples:
   EXPLAIN SELECT * FROM (SELECT * FROM a_big_table);
   EXPLAIN SELECT * FROM a_materializable_view;

Materialization is useless when a derived table is used in a join and the
partial query plan prior to the derived table produced empty result. In this
case the query will return empty result and materialized data wouldn't be used
at all.
Examples:
   table t1 is (int f1, int f2) and contains values (0,0),(0,1)
   SELECT * FROM t1 JOIN
     (SELECT big.f1 AS b1 from a big_table AS BIG) ON t1.f2=b1
     WHERE t1.f1 > 0;
   SELECT * FROM a_materializable_view AS view JOIN t1 ON t1.f2= view.v1
     WHERE t1.f1 > 0;

These cases could be optimized by moving derived tables materialization to the
point when the result is actually needed, i.e. right before the materialized
table have to be read.
In the worst case (i.e. derived tables were materialized)) query execution
will take the same time as before since no additional work is done.
In the best case (derived tables weren't materialized) the  execution will take
time shorter by the time needed to materialize derived table(s).

2. Speedup execution of queries adding indexes to derived tables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2.1 Allowing ref access to derived tables
-----------------------------------------
Ref access could greatly reduce amount of data that is needed to join a table.
It is used in queries like:
   SELECT * FROM t1 JOIN (SELECT * FROM t2) tt ON t1.f1=tt.f1;
   SELECT * FROM t1 JOIN a_materialized_view AS view ON t1.f1=view.f1;
In these examples adding index over field f1 from tt/view will bring it into
consideration by range optimizer and in case of lowest cost ref access will be
used.
Appropriate indexes could be added by taking following steps:
   * Create fake key description for the range optimizer:

 >> I prefer the term "possible key" - which you use later - over "fake key"

     For each derived table DT:
     * For each table (or derived table) T which is joined to DT find set of
       fields from DT that are used to join table T
     * For each found set create a description of the key that consists of all
       fields from the set and each field is included only once
   * Range optimizer will choose (or wouldn't, depending on cost) fake keys
     over derived tables to use for ref access.
   * After query execution plan is chosen, create one real key and drop the
     rest:
     For each derived table DT
     * Actually create the key chosen by the range optimizer
     * Drop other key descriptions for DT (drop all descriptions if ref access
       isn't used for DT)
   * While derived table being materialized and thus filling the result table the
     storage engine will create index according to saved key description.
After these steps we will have a materialized derived table with generated index
which is exactly the same as a usual table with index, so execution will go
flawlessly.
Since ref access is chosen among other access methods due to lower cost the
query will be always executed faster than without ref access. If ref access
have higher cost than some other access method then no real keys would
be created, thus in the worst case we loose nothing. The only overhead is the
creating of the fake key descriptions and overhead is negligible if compared
to cost of execution a whole query.

In addition to key descibe above we could create a composite key which would
contain all fields referenced in the derived table. Such key could be used in
several cases:
.) full key is used when the derive table is placed last in the join
.) a prefix of this key could be used in cases where derived table isn't the
    last one.

 >> Please add a note saying that this is for future, and not part of this WL

2.2 Allowing loose index scan over a derived table
--------------------------------------------------

 >> Please add in a reference to WL#1724 and/or WL#3220 (if we keep this part)

There is a class of queries with GROUP BY which can be greatly speed up by
using loose index scan because it allows to scan only a fraction of the table
instead of doing full scan.
Queries which could benefit are like following:
   SELECT MIN(f1) FROM (SELECT ....) tt;
   SELECT f1, MAX(f2) FROM (SELECT ... UNION SELECT ...) tt GROUP BY 1;
   SELECT DISTINCT * FROM a_materializable_view;

 >> I am questioning the usefulness of loose index scan for derived tables
 >> materialization.
 >> Imagine the query:
 >> SELECT MIN(f1) FROM (SELECT a+b AS f1 FROM t1 JOIN t2 ...)
 >> A more efficient way would be to rewrite this into:
 >> SELECT MIN(a+b) AS f1 FROM t1 JOIN t2 ...
 >>
 >> Another example:
 >> SELECT f1, MAX(f2) FROM (SELECT k, a+b FROM t1
 >>                    UNION SELECT k, c+d FROM t2) AS tt(f1, f2)
 >> GROUP BY f1;
 >>
 >> which could be rewritten as:
 >>
 >> SELECT f1, MAX(f2) FROM (SELECT k, MAX(a+b) FROM t1 GROUP BY k
 >>                    UNION SELECT k, MAX(c+d) FROM t2 GROUP BY k) AS tt(f1, f2)
 >> GROUP BY f1;
 >>
 >> (Here, we cannot eliminate the outer grouping because UNION may give us
 >> groups with more than one row).
 >>
 >> My point is that doing the materialization of the derived query is probably
 >> much more costly than the speed gain you achieve by doing skip-scan in
 >> the materialized index. If we do this optimization, it would be obsolete
 >> when a query rewrite facility is available.
 >>
 >> Another problem is that it materializes the table during optimization,
 >> which IMHO is a technique that we should try to avoid.

i.e. a query to be the subject of the loose index scan optimization should have
only one table, have GROUP BY clause or employ aggregate functions (only
MIN/MAX are allowed) or have the DISTINCT clause.

 >> Is this the exact specification of requirement for loosescan?

If those requirements are met we could generate a covering index over the
result of the derived table to allow this optimization.
Loose index scan applicability check consists of two parts. First part checks
semantic, like presence of GROUP BY clause, whether there are aggregate
functions and if they are of acceptable type and so on. If this part succeeds
the second part checks the cost of the loose index scan usage.
For a derived table we can check only the first part because
a key could be created only while result table creation and the cost check
relies on the statistics provided by the actual key. If the first check
succeeds for the derived table then the key that covers all fields of the
derived table result is created and derived table is materialized. After that
the second part is checked. However, the derived table could fail the second
check.
In the worst case (i.e. optimizer chose not to use loose index scan) the query
execution will be slower by the time needed to create covering index over the
derived table result.
In the best case the query execution will be faster, but how much faster
depends on the actual data.

LLD
---

1. Avoid materialization when possible
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Extending the derived tables handling routines
------------------------------------------
Derived table handling consists of four steps - prepare, optimize,
create result table, execute (materialize). Prior to this work, this was done
by 2 functions: mysql_derived_prepare handles preparation step and creates the
result table. mysql_derived_materialize handles the rest.
Now each step has its own function:
   mysql_derived_prepare - runs prepare step on a given derived table.
   mysql_derived_optimize - run optimization step on a given derived table.
   mysql_derived_create - create result table for a given derived table.
   mysql_derived_materialize - materialize a given derived table.

Derived tables handling steps
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Prepare
-------
This is the only derived tables handling step left at the open_table function.
Although physically the table is created at the 'create' step the TABLE
structure of the result table is instantiated here for the name resolution
purposes. Difference is that storage engine isn't called to create actual
files. This allows us to add keys later, if needed.

Optimize
--------
The new addition to the optimization step is the obtaining and saving the
estimated number of rows of the derived table since the derived table aren't
early materialized anymore and exact row count is unknown for them. This
statistics is needed for the optimization of the outer select. The estimated
number of rows is calculated during picking the best join order and kept in the
JOIN::best_rowcount variable, similar to JOIN::best_read. Also
make_join_statistics fills JOIN::best_rowcount in some corner cases.
The estimated number of rows is calculated according to following rules:
.) If there is no tables then select could produce 1 row at most, thus
    best_rowcount is set to 1.
.) If select's join is expected to produce only 1 row then the best_rowcount is
    set to 2. This needed to avoid marking the derived table as a const
    due to imprecise estimated value.

 >> I would be more happy with a "const" flag and an estimate that is either
 >> the exact number (if const) or an estimate (if !const), and no such
 >> "if 1 then" and "if 2 then" stories.
 >> But I will allow it if you insist :)

.) Otherwise best_rowcount is set to number of rows expected from select's join.
The st_select_lex_unit::optimize function sums JOIN::best_rowcount values
from all underlying joins and saves it to result->estimated_records variable
which is used by the upper select as a source of estimated number of records.
When subquery consists of only one select the estimate is saved by the
mysql_derived_optimize function.

 >> Why do you need specialized code for this? st_select_lex_unit::optimize()
 >> works also for a singleton query specification.

Optimization of derived tables are done bottom->up. Pair of handle_derived
methods is used for this, both methods are supplied with a processor which is
called upon appropriate derived tables. SELECT_LEX::handle_derived method calls
TABLE_LIST::handle_derived on all derived tables in the select. In it's turn
TABLE_LIST::handle_derived calls SELECT_LEX::handle_derived for all underlying
selects if the derived table is a mergeable view or calls given processor on
self if the derived table is materializable.

       JOIN::optimize                                             outer query
         1 v     ^ 4
SELECT_LEX::handle_derived(&mysql_derived_optimize)
         1 v                 ^ 4
  +<--TABLE_LIST::handle_derived
  |  ^   1 | 2 ^       3  v  ^ 4
  |  |     |   |       mysql_handle_single_derived
  |  |     |   |                       3 |  ^ 4
  |  |     v   |                         v  |                   derived table's
  |  | SELECT_LEX::handle_derived     mysql_derived_optimize      subquery
  |  |     |   ^                         v  ^
  |  +-----+   |                      JOIN::optimize
  +------------+                         |  ^
^^^^^^^^^^^^^^^^^^^^^^                  v  |
Optimize all underlying         make_join_statistics + choose_plan
derived tables before
optimizing given derived       ^^^^^^^^^^^^^^^^^^^^^^
table itself.                  Calculate and save estimate

For optimization step the mysql_derived_optimize processor is used. It does two
things:
.) calls to the st_select_lex_unit::optimize function (which calls
    JOIN::optimize on all selects in the union)
.) materializes derived tables with 0 or 1 rows. This is needed because
    optimizer tries to pre-read such tables before execution.

 >> It may be better to say "materializes derived tables that are const".
 >> Tables that have 0 or 1 rows is a special case that applies to MyISAM
 >> tables only. You also have grouped queries that are known to be 0 or 1 rows,
 >> but where the underlying data is non-const. Finally you have queries with
 >> ref access and const values that guarantee to return 0 or 1 rows.

After optimization of the derived table is done the estimate could be obtained
via the TABLE_LIST::fetch_number_of_rows function. Join optimizer uses
table->file->stats.records to get number of table's rows so in order
to simplify implementation TABLE_LIST::fetch_number_of_rows saves estimate
into this field.

Create result table
-------------------
The reason for the separate create step is to allow late key creation. But
we need instantiated TABLE structure for name resolution purposes, thus only
physical table creation is postponed. To do this the
select_union::create_result_table
function passes additional create_table flag via TMP_TABLE_PARAM structure to
the create_tmp_table function. When this flag is FALSE create_tmp_table
wouldn't call handler::create to actually create table thus just instantiating
the valid TABLE structure. To tell whether the table were actually created the
TABLE::created is added. To physically create a result table
mysql_derived_create calls handler::create on the given derived table and marks
the table as created. For SELECT queries this is done at the beginning of
JOIN::exec for each derived table in the same way as for the optimization step,
but with mysql_derived_create as the processor. Exceptions are:
.) For UPDATE queries for correct error handling derived tables have to be
    created right after preparation.
.) For SHOW COLUMNS derived  tables have to be created before call to the
    mysql_list_fields function.

Materialize
-----------
Materialized derived table is handled in the same way as a usual table with
the only exception that it has to be materialized prior to reading. This is
done by the join_materialize_table function. It's a helper function
and it just calls the mysql_derived_materialize function to materialize the
derived table and restores the read_first_record function.
After the join order and access methods has chosen, make_join_readinfo saves

 >> are chosen

chosen read_first_record function of a materialized derived table and
substitutes it for the join_materialize_table function. Beside that, a derived
table could be materialized in the create_sort_index and
add_group_and_distinct_keys functions.
Materialization in create_sort_index is needed to correctly handle the case of
queries like : SELECT ... FROM (subselect) ORDER BY ...

 >> Please add: Materialization in create_sort_index() may not be needed when
 >> it becomes possible to merge a derived query with an outer query.

On the add_group_and_distinct_keys see below.
After the derived table is materialized the newly added flag
TABLE_LIST::materialized is set to TRUE.

2. Speedup execution of queries by adding indexes to derived tables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2.1 Allowing ref access to derived tables
-----------------------------------------

Allowing ref access to a derived table is done in 3 steps:
a) Gather info on which derived table's fields are used to join this table.
b) Generate keys descriptions so optimizer could take them into account when
    choosing access method.

 >> access method -> join order and access method

c) Drop unused keys descriptions after access method was chosen.

 >> after access method -> after optimization

In more details:
a) A good place to gather such info is the add_key_field function. Range
optimizer uses this function to gather all keys that could be used in the
query. Prior to checking index on the given field add_key_field calls
TABLE_LIST::update_derived_keys to update list of possible keys for the
derived table. This way derived table's are taken into account.
Info on possible keys is maintained on the per-TABLE_LIST base. Each possible
key is represented by the DERIVED_KEY_MAP object. It holds info on which fields

 >> By our naming standard, this class should be called Derived_key_map (or
 >> perhaps just Derived_key - the map is one component of the class).

the possible key consists of and the table map of the table which needs this
key. Set of such objects are stored in the TABLE_LIST::derived_keymap_list list

 >> Personal preference: Do not indicate implementation details in field names,
 >> a list of derived keys may be called just "derived_keys" instead of
 >> "derived_key_list". No change needed if you decide to replace the list with
 >> another structure.

and index of the object in the list represents the number of the key. To keep
track on keys the TABLE::max_keys variable is added. It is increased when the
new TABLE_LIST::derived_keymap_list entry is added.

 >> Why do you maintain number of keys in TABLE object, while the derived keys
 >> are collected within TABLE_LIST? And, can you not just query the list for
 >> element count?

The TABLE_LIST::update_derived_keys function accepts a field from the derived
table and a set of values (actually fields of other tables) for given field.

 >> set of names of fields from outer tables?

For each value it searches appropriate possible key and updates it to include
given field. This field's part_of_key bitmap is also updated accordingly.
When there is no entry for a value then a new entry is created. In this case
the field's key_start bitmap is updated also.

 >> Can you explain what key_start_bitmap is?

This list is generated at the first execution and kept for all subsequent
runs.

b) This is enough for range optimizer only to start considering derived table's
possible keys. To choose the access method optimizer needs keys descriptions.
They are generated by the helper function - TABLE_LIST::generate_derived_keys.
update_ref_and_keys function calls it after info on all possible keys has been
gathered. It allocates memory for keys descriptions with help of the
TABLE::alloc_keys function and then iterates over list of tables and calls
TABLE_LIST::generate_keys for each materialized derived one. In it's turn the

 >> materialized derived one -> materialized derived table?

latter function iterates over derived_keymap_list list and for each object
calls TABLE::add_tmp_key to actually create key description.
For each execution a new result table is created. Thus key descriptions are
also re-created using the list of possible keys created at the first run.

c) After make_join_statistics has finished (i.e range optimizer has chosen
access methods for all tables) derived tables' possible keys that weren't
chosen should be dropped to not waste resources. This task is done by
the JOIN::drop_unused_derived_keys function. It iterates over table list and
for materialized derived ones calls TABLE::use_index with index number chosen
by the range optimizer. TABLE::use_index saves given index as index #0 and
drops the rest. '-1' given as the index number means no index was chosen and
all of them has to be dropped.

Eventually, JOIN::exec will be called, it will call mysql_derived_create which
will actually create result tables with chosen key. Thus, when subquery
execution code will fill table the chosen index will be automatically created
by the table engine.

The memory allocated for DERIVED_KEY_MAP objects and possible keys descriptions
is freed at the TABLE_LIST::cleanup.

2.2 Allowing loose index scan over a derived table
--------------------------------------------------
This task is done in the add_group_and_distinct_keys function.
It does partial check whether loose index scan is applicable, whether the
result table doesn't contain blob fields and if so it creates appropriate
covering index. The index on the derived table's result table is created by
the TABLE::add_tmp_key function. After this the derived table is materialized
with help of the mysql_derived_create and mysql_derived_materialize functions.
 From this point the derived table is the same as a usual table with an
appropriate index.
The get_best_group_min_max function does necessary cost evaluation and if it
passes well the loose index scan access method is chosen for the derived
table.
Because loose index scan is essentially an index scan it could has much bigger
scan time compared to table scan, so to avoid performance regression
actual info on key selectivity have to be used. For obtaining it we have to
materialize the derived table.
The disadvantage of this optimization is that EXPLAIN will materialize such
derived table and that could result in a long-running EXPLAIN.

Thanks,
Roy
Thread
WL5274 - Architecture reviewRoy Lyseng27 Sep