From: Ann W. Harrison Date: December 10 2008 6:57pm Subject: Re: [Fwd: Re: More Falcon fun] List-Archive: http://lists.mysql.com/falcon/296 Message-Id: <49401102.9020101@mysql.com> MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=ISO-8859-1 Content-Transfer-Encoding: 7BIT > Kevin Lewis wrote: >> >> Apparently, InnoDB can make one pass through the database and >> collect the index values for multiple new indexes, building >> them all at once. Falcon could do the same building multiple >> deferredIndexes and let the gopher threads affectively build >> the indexes from presorted deferredIndex groups. >> >> But I do not think that Falcon receives a multiple createIndex call. >> can you check how InnoDB does this and how Falcon handles something >> like: >>>> ALTER TABLE trade >>>> ADD INDEX idx_t_ca_id_dts ( t_ca_id, t_dts), >>>> ADD INDEX idx_t_s_symb_dts ( t_s_symb, t_dts), >>>> ADD INDEX fk_t_st_id (t_st_id), >>>> ADD INDEX fk_t_tt_id (t_tt_id) Just as an aside, the syntax above is a MySQL extension to SQL. There is nothing remotely like it in the standard. However, the alter table statement arrives at Falcon with the current structure of the table, the desired structure of the table, and a bit mask of the changes. If al the changes can be done on-line, Falcon makes the changes. Otherwise Falcon goes back to the blocking algorithm of copy table, make changes, copy table back. Sergey Vojtovich wrote: > > How it happens with InnoDB: > - create temporary table > - copy data to temporary table and create indexes > - rename temporary table > - goes via blocking alter table > > How it happens with Falcon: > - StorageInterface::alter_table_phase2() calls StorageInterface::addIndex() > - ::addIndex() iterates through all to-be-created indexes and calls > ::createIndex() > - goes via online alter table, there seem to be no temporary tables, > no data copy > Right, but what Kevin wants is to add a new mechanism to Falcon which does not currently exist, that will allow it to build multiple indexes in parallel from a single pass over the table. Currently Falcon reads the table once for each index, as Sergey points out. When a table is loaded with pre-defined indexes, Falcon builds all the indexes at once. If there were a createMultipleIndexes method, it could use the same underlying mechanisms to avoid reading the table over and over. 1) This is a performance feature 2) Generally, indexes are added one at a time. The exception to rule 2 is that people with experience with other engines or databases will create a table with no indexes, load data, then add the indexes. For most systems, that's faster than loading indexes and data in parallel. But it won't be for Falcon unless we change the way indexes are built. In fact, loading with indexes in place will be faster. For the interested reader, here approximately is how Falcon builds indexes: For each index, the transaction builds an in-memory deferred index segment. When the segments reach a certain size, they're "chilled" into the serial log. When the load is committed, the actual index pages are created by the gopher thread from the chilled segments. The first segment is converted first, then the second segment is applied to it, then the third segment is applied, etc. Each segment is sorted, but applying the segments one at a time means that (in all likelihood) each index page is visited once for each segment and some indexes segments will overfill and split during the load. Merging the segments in parallel would be significantly more efficient. We probably don't want the gopher to use a merge when adding several segments to an existing index. An advantage of the current addIndex is that it uses less memory than loading multiple indexes in parallel. Best, Ann