MySQL Lists are EOL. Please join:

List:Internals« Previous MessageNext Message »
From:shawn l.green Date:November 13 2015 1:27am
Subject:Re: Developing and Index Advisor for MySQL
View as plain text  
Hello Eduardo,

On 11/12/2015 7:17 PM, Eduardo Weiland wrote:
> Quoting Physical Database Design [1]:
>
> "index statistics can be sufficiently estimated for the purposes of
> physical design advisors from relatively straightforward calculations
> based on the table and columns statistics of the underlying table that
> the virtual index will be based on."
>
> There are already tools like this for other RDBMS: Microsoft SQL Server
> Database Tuning Advisor, Oracle Access Advisor and IBM DB2 Design
> Advisor. I just want to create a tool that does the same thing for MySQL.
>
> Furthermore I intend to analyze all the SQL queries, including SELECTs,
> INSERTs, UPDATEs and DELETEs. Thus I want to find some index
> configuration that is good enough for the entire workload. I do know
> that workload can vary over time and the optimizations become outdated,
> but this is an academic research and I am assuming a constant workload.
> Adapt to dynamic workload changes is future work.
>
> I plan to use some statistics such as how many times each query is
> executedto make queries that run most often to perform better. So, if
> there are more INSERTs and UPDATEs than SELECTs in a given table, I
> intend to recommend less indexes for this table, and even suggest to
> drop some existing indexes if needed.
>
>
>
> [1] LIGHTSTONE S., TEOREY T., NADEAU T. Physical Database Design: The
> Database Professional’s Guide to Exploiting Indexes, Views, Storage, and
> More. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2007.
> ISBN 9780080552316.
>
>
>
> Em 12-11-2015 18:56, shawn l.green escreveu:
>> Hello Eduardo,
>>
>> On 11/5/2015 4:58 AM, Eduardo Weiland wrote:
>>> Hello there,
>>>
>>> I am new here in the list. I am from Brazil and I am finishing my
>>> bachelor's degree in Computer Science.
>>>
>>> I have to do a final work for the course, and (I don't know why) I
>>> came out with the idea of developing an index advisor for MySQL.
>>>
>>> Just to be clear, by index advisor I mean some tool that suggests
>>> what indices can be created in the database, based on statistics and
>>> query log. In other words, indexes that do not exist in the database
>>> but can be created by the administrator in order to improve query
>>> execution performance.
>>>
>>> I have never worked with MySQL internals before. I know a lot of C
>>> and C++ programming, so I just need some help to understand MySQL
>>> structure.
>>>
>>> My main inspirations for doing this work are [1] and [2] (actually,
>>> they are the same work, but two different articles). One important
>>> thing I have found in [1]:
>>>
>>>      Before making the decisions, the optimizer allows the developer
>>> to override the information about physical design by using several
>>> function ╠│hooks‘. The hooks can be replaced at runtime with functions
>>> that insert new stastistics information into the list of physical
>>> design features. This makes the optimizer believe that the newly
>>> inserted data regarding the what-if indexes and what-if tables are
>>> present in the database. Then, the optimizer selects the execution
>>> plans using the statistics from the what-if features.
>>>
>>> I guess that this is what I need. Some way to virtually create
>>> indexes and then calling the MySQL optimizer or some sort of EXPLAIN
>>> method to understand "what MySQL optimizer thinks about this index?
>>> Would this index be used if it existed?".
>>>
>>> I am probably wrong about all I have said here, but I want to hear
>>> what you think about this.
>>>
>>> I am very grateful for all your help.
>>>
>>> Att.
>>>
>>> Eduardo Weiland
>>> http://eduardoweiland.info
>>>
>>> [1] PARINDA: an interactive physical designer for PostgreSQL
>>> http://dl.acm.org/citation.cfm?id=1739131
>>> [2] An automated, yet interactive and portable DB designer
>>> http://dl.acm.org/citation.cfm?id=1807167.1807314
>>>
>>>
>>
>> The drawback to your approach is that you won't have the cardinality
>> estimates for your index trials until you actually test those tuple
>> combinations for uniqueness.  It is the cardinality that the optimizer
>> bases its estimates on.
>>
>> Other approaches (most of them are manual) have been to examine the
>> query patterns (available in digest form from the PERFORMANCE SCHEMA)
>> or from the actual query statistics in the Slow query log (which can
>> be converted to patterns by tools like mysqldumpslow) to look for
>> patterns that are both
>> a) frequently used
>> b) examining many more rows than they need to return
>>
>> Quite often, they can be improved by adding indexes to certain tables.
>> In many cases, though, it was because the query used a dependent
>> subquery which simply had to search a lot of row combinations.
>>
>> So, it is easy to suggest indexes that would solve certain queries
>> quickly but over-indexing a table (creating more indexes than you
>> need) only hurts INSERT, UPDATE, and DELETE performance if you are
>> trying to improve a query you only execute rarely.
>>
>


You said (in your top-posted response)
 > "index statistics can be sufficiently estimated for the purposes of
 > physical design advisors from relatively straightforward calculations
 > based on the table and columns statistics of the underlying table that
 > the virtual index will be based on."

That is exactly the task I said you would need to do when I said "until 
you actually test those tuple combinations for uniqueness".  MySQL does 
not keep or generate statistics for every column of a table. It only 
generates statistics for each index based on the columns used in that 
index.

I perfectly agree with modeling a constant distribution of query types. 
There is no need for you to attempt to make your system dynamic.

Here are some references to help you get started:

* The sys Schema (an intrinsic part of MySQL version 5.7 but can be 
added to 5.6 with ease):
https://github.com/MarkLeith/mysql-sys
(search for any blogs by Mark Leith, the author, for usage ideas)

* Current index usage (useful for finding indexes that should be dropped)
 
http://dev.mysql.com/doc/refman/5.7/en/table-waits-summary-tables.html#table-io-waits-summary-by-index-usage-table

* Useful pages from the manual
http://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html
http://dev.mysql.com/doc/refman/5.7/en/index-statistics.html
http://dev.mysql.com/doc/refman/5.7/en/index-btree-hash.html
http://dev.mysql.com/doc/refman/5.7/en/innodb-fulltext-index.html
http://dev.mysql.com/doc/refman/5.7/en/index-hints.html
http://dev.mysql.com/doc/refman/5.7/en/index-extensions.html
http://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_covering_index
http://dev.mysql.com/doc/refman/5.7/en/innodb-statistics-estimation.html
http://dev.mysql.com/doc/refman/5.7/en/innodb-persistent-stats.html
http://dev.mysql.com/doc/refman/5.7/en/slow-query-log.html

One alternative to changing the tables is to change the query to more 
efficiently use the indexes already on the table. But that would 
probably be your next project :)

http://dev.mysql.com/doc/refman/5.7/en/query-rewrite-plugins.html


Yours,
-- 
Shawn Green
MySQL Senior Principal Technical Support Engineer
Oracle USA, Inc. - Integrated Cloud Applications & Platform Services
Office: Blountville, TN

Become certified in MySQL! Visit https://www.mysql.com/certification/ 
for details.
Thread
Developing and Index Advisor for MySQLEduardo Weiland5 Nov
  • Re: Developing and Index Advisor for MySQLshawn l.green12 Nov
    • Re: Developing and Index Advisor for MySQLEduardo Weiland13 Nov
      • Re: Developing and Index Advisor for MySQLshawn l.green13 Nov