List:Summer of Code« Previous MessageNext Message »
From:scut_tang Date:July 5 2009 1:20pm
Subject:GSoC I_S/P_S storage engine midterm report
View as plain text  
       Midterm surveys of Google Summer of Code 2009 will be coming next week, and here is
my midterm report to summarize my work in this period, although the report is not

The project mainly has five stages:

1.         Implement all tables’ definitions of INFORMATION_SCHEMA;

2.         All tables can get right data;

3.         Implement index operations of some tables, not all table;

4.         Implementation of privileges;

5.         Remove the old implementation of INFORMATION_SCHEMA and verify the new

Now I have completed the first stage and have some ideas of stage two, also commence to
code stage two. INFORMATION_SCHEMA tables are already fixed in MySQL server and users are
not allowed to modify them. In the past month, there are three proposals of implementation
of all tables’ definitions have been proposed:

1.         Like PERFORMANCE_SCHEMA storage engine, all frm files are pre-created through
mysql_insall_db script. But this will generate numerous problems. For example, users can
perform CREATE/DROP/ALTER operations. Besides that, this design doesn’t really work
for plugins. Users can install and unload a plugin any time, so the frm files have to
pre-create and disappear automatically. And a user can load a new version of the plugin
with a different structure of the same INFORMATION_SCHEMA table, and the table must be
automatically created in the new format. So this design would be really painful.

2.         Under the guidance of Sergei, I want to use function ‘discover’ to
discover INFORMATION_SCHEMA table. Function ‘discover’ is applied to NDB
storage engine (NDB storage engine is a cluster storage engine). In this design, function
‘discover’ of INFORMATION_SCHEMA storage engine have to generate frm-like
contents. This proposal looks nice, but it is complicated and it is hard to generate
frm-like contents for me.

3.         This is Sergei’s idea that function ‘discover’ returns filled
TABLE_SHARE instead of frm-like contents. Normally, MySQL opens a table will be: finds the
frm file -> fills TABLE_SHARE by frm contents -> generates TABLE by TABLE_SHARE.
According to the definition of table fills TABLE_SHARE and it can skip the step of getting
frm. The design is easy to implement, but the weakness is to modify function
‘discover’ prototype, which make NDB and Archive storage engine be modified.
The modifications are not hard to do. Actually, I have the similar idea. For it required
modification of ‘discover’ interface, I dropped it.

Finally, the third proposal is accepted, which has been done and passed my test.
Definitely, the code must be passed Sergei’s examination.

It is the first step costs. Since my experience is mainly in the storage engine before,
not the whole MySQL internals, it costs me to be familiar with something above storage
engine level. After the work of the first stage, I believe that over the next four stages
will process quickly than the first stage. I make the project schedule here:

7.6  –  7.20: Stage 2

7.20  –  7.31:Stage 3

8.1   –  8.10:Stage 4

8.11  –  8.17:Stage 5

Because the date of final summiting code is Sep. 3, may be there is enough time to perfect
INFORMATION_SCHEMA storage engine further.

I think the program design of PERFORMANCE_SCHEMA is good, so I write INFORMATION_SCHEMA
storage engine referenced by PERFORMANCE_SCHEMA. The code contains:

1., ha_infoschema.h: The mainly storage engine files;

2., infos_table.h: The abstract base class for all tables. All the
tables is classified by their readable/writable character. I put all tables to readable
type in this stage. I will put tables into the right type when I implement it. Sergei said
some tables had write/modify operations before, but I think all INFORMATION_SCHEMA tables
are just read only for users. So is it better to classify tables by whether tables have
index? Such as table ENGINES doesn’t need index, but table COLUMNS realize index
would acquire better performance.

3., table_xxx.h: Table xxx’s definition and its operation
implementation. There are 34 tables.

In addition, because INFORMATION_SCHEMA old implementation still exists, the queries about
INFORMATION_SCHEMA will perform in the original way. In order to test INFORMATION_SCHEMA
storage engine new implementation, the new name is ‘INFOSCHEMA’. For example,
to test table ENGINES is to execute: “SELECT * FROM INFOSCHEMA.ENGINES;”, not

Because the problems of launchpad, I pose the code to Sergei by diff file, whose parent is

Sergei Golubchik, thank you every much for your patient instructions and help! I am really
looking forward to learn more from you next. 





GSoC I_S/P_S storage engine midterm reportscut_tang5 Jul