List:General Discussion« Previous MessageNext Message »
From:Karen Abgarian Date:May 8 2012 3:37am
Subject:Re: 回复: 回复
: Why is creating indexes faster after inserti
ng massive data rows?
View as plain text  
Honestly, I did not understand that.   I did not say anything about being complicated. 
What does mysql not use, caching??

Judging by experience, creating a unique index on say, a 200G table could be a bitter one.
  


On 07.05.2012, at 19:26, Zhangzhigang wrote:

> Karen...
> 
> The mysql does not use this approach what you said which is complicated.
> 
> I  agree with ohan De Meersman.
> 
> 
> ________________________________
> 发件人: Karen Abgarian <abvk@stripped>
> 收件人: mysql@stripped 
> 发送日期: 2012年5月8日, 星期二, 上午 1:30
> 主题: Re: 回复: Why is creating indexes faster after inserting massive data
> rows?
> 
> Hi, 
> 
> A couple cents to this. 
> 
> There isn't really a million of block writes.   The record gets added to the block,
> but that gets modified in OS cache if we assume MyISAM tables and in the Innodb buffer if
> we assume InnoDB tables.   In both cases, the actual writing does not take place and does
> not slow down the process.    What does however happen for each operation, is processing
> the statement, locating the entries to update in the index, index block splits and , for
> good reason, committing.  
> 
> When it comes to creating an index, what needs to happen, is to read the whole table
> and to sort all rows by the index key.   The latter process will be the most determining
> factor in answering the original question, because for the large tables the sort will have
> to do a lot of disk I/O.    The point I am trying to make is there will be situations when
> creating indexes and then inserting the rows will be faster than creating an index
> afterwards.   If we try to determine such situations, we could notice that the likelihood
> of the sort going to disk increases with the amount of distinct values to be sorted.   For
> this reason, my choice would be to create things like primary/unique keys beforehand
> unless I am certain that everything will fit in the available memory. 
> 
> Peace
> Karen
> 
> 
> 
> On May 7, 2012, at 8:05 AM, Johan De Meersman wrote:
> 
>> ----- Original Message -----
>> 
>>> From: "Zhangzhigang" <zzgang_2008@stripped>
>> 
>>> Ok, Creating the index *after* the inserts, the index gets created in
>>> a single operation.
>>> But the indexes has to be updating row by row after the data rows has
>>> all been inserted. Does it work in this way?
>> No, when you create an index on an existing table (like after a mass insert),
> what happens is that the engine does a single full tablescan and builds the index in a
> single pass, which is a lot more performant than updating a single disk block for every
> record, for the simple reason that a single disk block can contain dozens of index
> entries. 
>> 
>> Imagine that you insert one million rows, and you have 100 index entries in a
> disk block (random numbers, to make a point. Real numbers will depend on storage, file
> system, index, et cetera). Obviously there's no way to write less than a single block to
> disk - that's how it works. 
>> 
>> You can update your index for each record in turn. That means you will need to do
> 1 million index - and thus block - writes; plus additional reads for those blocks you
> don't have in memory - that's the index cache. 
>> 
>> Now, if you create a new index on an existing table, you are first of all
> bypassing any index read operations - there *is* no index to read, yet. Then the system is
> going to do a full tablescan - considered slow, but you need all the data, so there's no
> better way anyway. The index will be built - in-memory as much as possible - and the
> system will automatically prefer to write only complete blocks - 10.000 of them. That's
> the exact same number of index blocks, but you only write each block once, so that's only
> 10.000 writes instead of 1.000.000. 
>> 
>> Now there's a lot more at play, things like B-tree balancing and whatnot, but
> that's the basic picture. 
>> 
>> -- 
>> 
>> Bier met grenadyn 
>> Is als mosterd by den wyn 
>> Sy die't drinkt, is eene kwezel 
>> Hy die't drinkt, is ras een ezel 
> 
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql

Thread
Why is creating indexes faster after inserting massive data rows?Zhangzhigang7 May
  • Re: Why is creating indexes faster after inserting massive data rows?Ananda Kumar7 May
    • 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang7 May
  • Re: Why is creating indexes faster after inserting massive data rows?Johan De Meersman7 May
    • 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang7 May
      • Re: 回复: Why is creating indexes faster after inserting massive data rows?Alex Schaft7 May
        • 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang7 May
          • Re: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Claudio Nanni7 May
            • too nice not to share it!hsv2 Oct
          • Re: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Mihamina Rakotomandimby30 May
      • Re: 回复: Why is creating indexes faster after inserting massive data rows?Claudio Nanni7 May
        • 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang7 May
      • Re: 回复: Why is creating indexes faster after inserting massive data rows?Johan De Meersman7 May
        • RE: 回复: Why is creating indexes faster after inserting massive data rows?Rick James7 May
        • Re: 回复: Why is creating indexes faster after inserting massive data rows?Karen Abgarian7 May
          • 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang8 May
            • Re: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Karen Abgarian8 May
              • 回复: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang8 May
                • Re: 回复: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Johan De Meersman8 May
                  • 回复: 回复: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang8 May
                  • Re: 回复: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Karen Abgarian8 May
                    • 回复: 回复: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang9 May
              • RE: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Rick James10 May
            • Re: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Johan De Meersman8 May
        • Re: 回复: Why is creating indexes faster after inserting massive data rows?Karen Abgarian7 May
    • RE: Why is creating indexes faster after inserting massive datarows?Rick James7 May
      • 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang9 May
        • RE: 回复: Why is creating indexes faster after inserting massive data rows?Rick James9 May
          • Re: 回复: Why is creating indexes faster after inserting massive data rows?Claudio Nanni9 May
          • 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang10 May