List:General Discussion« Previous MessageNext Message »
From:Rick James Date:May 7 2012 5:20pm
Subject:RE: 回复: Why is creating indexes faster
after inserting massive data rows?
View as plain text  
As a side note, TokuDB uses what it calls "fractal technology" to somewhat improve the performance of incremental INDEXing.  They delay some of the BTree work so that they can better batch stuff.  While waiting for that to finish, queries are smart enough to look in more than one place for the index info.

InnoDB does something similar, but it is limited to the size of the buffer_pool.

> -----Original Message-----
> From: Johan De Meersman [mailto:vegivamp@stripped]
> Sent: Monday, May 07, 2012 8:06 AM
> To: Zhangzhigang
> Cc: mysql@stripped
> Subject: Re: 回复: Why is creating indexes faster after inserting
> massive data rows?
> 
> ----- Original Message -----
> 
> > From: "Zhangzhigang" <zzgang_2008@yahoo.com.cn>
> 
> > Ok, Creating the index *after* the inserts, the index gets created in
> > a single operation.
> > But the indexes has to be updating row by row after the data rows has
> > all been inserted. Does it work in this way?
> No, when you create an index on an existing table (like after a mass
> insert), what happens is that the engine does a single full tablescan
> and builds the index in a single pass, which is a lot more performant
> than updating a single disk block for every record, for the simple
> reason that a single disk block can contain dozens of index entries.
> 
> Imagine that you insert one million rows, and you have 100 index
> entries in a disk block (random numbers, to make a point. Real numbers
> will depend on storage, file system, index, et cetera). Obviously
> there's no way to write less than a single block to disk - that's how
> it works.
> 
> You can update your index for each record in turn. That means you will
> need to do 1 million index - and thus block - writes; plus additional
> reads for those blocks you don't have in memory - that's the index
> cache.
> 
> Now, if you create a new index on an existing table, you are first of
> all bypassing any index read operations - there *is* no index to read,
> yet. Then the system is going to do a full tablescan - considered slow,
> but you need all the data, so there's no better way anyway. The index
> will be built - in-memory as much as possible - and the system will
> automatically prefer to write only complete blocks - 10.000 of them.
> That's the exact same number of index blocks, but you only write each
> block once, so that's only 10.000 writes instead of 1.000.000.
> 
> Now there's a lot more at play, things like B-tree balancing and
> whatnot, but that's the basic picture.
> 
> --
> 
> Bier met grenadyn
> Is als mosterd by den wyn
> Sy die't drinkt, is eene kwezel
> Hy die't drinkt, is ras een ezel
Thread
Why is creating indexes faster after inserting massive data rows?Zhangzhigang7 May
  • Re: Why is creating indexes faster after inserting massive data rows?Ananda Kumar7 May
    • 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang7 May
  • Re: Why is creating indexes faster after inserting massive data rows?Johan De Meersman7 May
    • 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang7 May
      • Re: 回复: Why is creating indexes faster after inserting massive data rows?Alex Schaft7 May
        • 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang7 May
          • Re: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Claudio Nanni7 May
            • too nice not to share it!hsv2 Oct
          • Re: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Mihamina Rakotomandimby30 May
      • Re: 回复: Why is creating indexes faster after inserting massive data rows?Claudio Nanni7 May
        • 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang7 May
      • Re: 回复: Why is creating indexes faster after inserting massive data rows?Johan De Meersman7 May
        • RE: 回复: Why is creating indexes faster after inserting massive data rows?Rick James7 May
        • Re: 回复: Why is creating indexes faster after inserting massive data rows?Karen Abgarian7 May
          • 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang8 May
            • Re: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Karen Abgarian8 May
              • 回复: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang8 May
                • Re: 回复: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Johan De Meersman8 May
                  • 回复: 回复: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang8 May
                  • Re: 回复: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Karen Abgarian8 May
                    • 回复: 回复: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang9 May
              • RE: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Rick James10 May
            • Re: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Johan De Meersman8 May
        • Re: 回复: Why is creating indexes faster after inserting massive data rows?Karen Abgarian7 May
    • RE: Why is creating indexes faster after inserting massive datarows?Rick James7 May
      • 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang9 May
        • RE: 回复: Why is creating indexes faster after inserting massive data rows?Rick James9 May
          • Re: 回复: Why is creating indexes faster after inserting massive data rows?Claudio Nanni9 May
          • 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang10 May