List:Falcon Storage Engine« Previous MessageNext Message »
From:Ann W. Harrison Date:December 10 2008 6:57pm
Subject:Re: [Fwd: Re: More Falcon fun]
View as plain text  
 > Kevin Lewis wrote:
 >>  Apparently, InnoDB can make one pass through the database and
 >> collect the index values for multiple new indexes, building
 >> them all at once.  Falcon could do the same building multiple
 >> deferredIndexes and let the gopher threads affectively build
 >> the indexes  from presorted deferredIndex groups.
 >> But I do not think that Falcon receives a multiple createIndex call.
 >> can you check how InnoDB does this and how Falcon handles something 
 >> like:
 >>>> ALTER TABLE trade
 >>>>  ADD INDEX idx_t_ca_id_dts ( t_ca_id, t_dts),
 >>>>  ADD INDEX idx_t_s_symb_dts ( t_s_symb, t_dts),
 >>>>  ADD INDEX fk_t_st_id (t_st_id),
 >>>>  ADD INDEX fk_t_tt_id (t_tt_id)

Just as an aside, the syntax above is a MySQL extension to
SQL.  There is nothing remotely like it in the standard.

However, the alter table statement arrives at Falcon with the
current structure of the table, the desired structure of the
table, and a bit mask of the changes. If al the changes can
be done on-line, Falcon makes the changes.  Otherwise Falcon
goes back to the blocking algorithm of copy table, make
changes, copy table back.

Sergey Vojtovich wrote:
> How it happens with InnoDB:
> - create temporary table
> - copy data to temporary table and create indexes
> - rename temporary table
> - goes via blocking alter table
> How it happens with Falcon:
> - StorageInterface::alter_table_phase2() calls StorageInterface::addIndex()
> - ::addIndex() iterates through all to-be-created indexes and calls
>   ::createIndex()
> - goes via online alter table, there seem to be no temporary tables,
>   no data copy

Right, but what Kevin wants is to add a new mechanism to Falcon which
does not currently exist, that will allow it to build multiple indexes
in parallel from a single pass over the table.  Currently Falcon reads
the table once for each index, as Sergey points out.

When a table is loaded with pre-defined indexes, Falcon builds all the
indexes at once. If there were a createMultipleIndexes method, it could
use the same underlying mechanisms to avoid reading the table over and

1) This is a performance feature
2) Generally, indexes are added one at a time.

The exception to rule 2 is that people with experience with other
engines or databases will create a table with no indexes, load data,
then add the indexes.  For most systems, that's faster than loading
indexes and data in parallel.  But it won't be for Falcon unless we
change the way indexes are built.  In fact, loading with indexes in
place will be faster.

For the interested reader, here approximately is how Falcon builds

For each index, the transaction builds an in-memory deferred index
segment.  When the segments reach a certain size, they're "chilled"
into the serial log.  When the load is committed, the actual index pages
are created by the gopher thread from the chilled segments.  The first
segment is converted first, then the second segment is applied to it,
then the third segment is applied, etc.  Each segment is sorted, but
applying the segments one at a time means that (in all likelihood)
each index page is visited once for each segment and some indexes
segments will overfill and split during the load.

Merging the segments in parallel would be significantly more efficient.
We probably don't want the gopher to use a merge when adding several
segments to an existing index.

An advantage of the current addIndex is that it uses less memory than
loading multiple indexes in parallel.


Re: [Fwd: Re: More Falcon fun]Ann W. Harrison10 Dec
Re: [Fwd: Re: More Falcon fun]Ann W. Harrison11 Dec
Re: [Fwd: Re: More Falcon fun]Ann W. Harrison11 Dec
  • Re: [Fwd: Re: More Falcon fun]Kevin Lewis11 Dec