List:General Discussion« Previous MessageNext Message »
From:Rick James Date:May 10 2012 4:03pm
Subject:RE: 回复: 回复: Why is creating index
es faster after inserting massive data
rows?
View as plain text  
One more wrinkle...

When adding a UNIQUE index, MySQL must build the BTree, and cannot do a sortmerge.  (I think this is a true statement, but I am not positive.)  The reason...

A UNIQUE index is two things:  an INDEX, and a UNUQUEness "constraint".  In order to enforce the constraint, it must check each record as it is inserted into the table.  This means that the index must exist, and contain all the row that have been added so far.

Conclusion:  Do not make an INDEX "UNIQUE" unless you really need it.  Example:
PRIMARY KEY (a,b)
UNIQUE (b,a) -- The UNIQUEness constraint here adds nothing useful; make it just INDEX.

There are still more wrinkles...
* InnoDB benefits from inserting rows in the PRIMARY KEY order.
* INSERT ... (SELECT ... ORDER BY ...) -- Sometimes it is useful to do a sortmerge in the SELECT in order to make the INSERT more efficient.
* ALTER TABLE ... ORDER BY ... -- For MyISAM (not for InnoDB), this can "cluster" the rows to make certain SELECTs do fewer disk hits.

Massive "data warehouse" "fact tables" often should have nothing but a PRIMARY KEY.  All big SELECTs should hit "summary tables" that aggregate data to make "reports" more efficient.  (I have seen 10x to 1000x performance improvement.)  Should we discuss this?


> -----Original Message-----
> From: Karen Abgarian [mailto:abvk@stripped]
> Sent: Monday, May 07, 2012 8:37 PM
> To: mysql@strippedom
> Subject: Re: 回复: 回复: Why is creating indexes faster after
> inserting massive data rows?
> 
> Honestly, I did not understand that.   I did not say anything about
> being complicated.  What does mysql not use, caching??
> 
> Judging by experience, creating a unique index on say, a 200G table
> could be a bitter one.
> 
> 
> On 07.05.2012, at 19:26, Zhangzhigang wrote:
> 
> > Karen...
> >
> > The mysql does not use this approach what you said which is
> complicated.
> >
> > I  agree with ohan De Meersman.
> >
> >
> > ________________________________
> > 发件人: Karen Abgarian <abvk@stripped>
> > 收件人: mysql@stripped
> > 发送日期: 2012年5月8日, 星期二, 上午 1:30
> > 主题: Re: 回复: Why is creating indexes faster after inserting
> massive data rows?
> >
> > Hi,
> >
> > A couple cents to this.
> >
> > There isn't really a million of block writes.   The record gets added
> to the block, but that gets modified in OS cache if we assume MyISAM
> tables and in the Innodb buffer if we assume InnoDB tables.   In both
> cases, the actual writing does not take place and does not slow down
> the process.    What does however happen for each operation, is
> processing the statement, locating the entries to update in the index,
> index block splits and , for good reason, committing.
> >
> > When it comes to creating an index, what needs to happen, is to read
> the whole table and to sort all rows by the index key.   The latter
> process will be the most determining factor in answering the original
> question, because for the large tables the sort will have to do a lot
> of disk I/O.    The point I am trying to make is there will be
> situations when creating indexes and then inserting the rows will be
> faster than creating an index afterwards.   If we try to determine such
> situations, we could notice that the likelihood of the sort going to
> disk increases with the amount of distinct values to be sorted.   For
> this reason, my choice would be to create things like primary/unique
> keys beforehand unless I am certain that everything will fit in the
> available memory.
> >
> > Peace
> > Karen
> >
> >
> >
> > On May 7, 2012, at 8:05 AM, Johan De Meersman wrote:
> >
> >> ----- Original Message -----
> >>
> >>> From: "Zhangzhigang" <zzgang_2008@strippedom.cn>
> >>
> >>> Ok, Creating the index *after* the inserts, the index gets created
> >>> in a single operation.
> >>> But the indexes has to be updating row by row after the data rows
> >>> has all been inserted. Does it work in this way?
> >> No, when you create an index on an existing table (like after a mass
> insert), what happens is that the engine does a single full tablescan
> and builds the index in a single pass, which is a lot more performant
> than updating a single disk block for every record, for the simple
> reason that a single disk block can contain dozens of index entries.
> >>
> >> Imagine that you insert one million rows, and you have 100 index
> entries in a disk block (random numbers, to make a point. Real numbers
> will depend on storage, file system, index, et cetera). Obviously
> there's no way to write less than a single block to disk - that's how
> it works.
> >>
> >> You can update your index for each record in turn. That means you
> will need to do 1 million index - and thus block - writes; plus
> additional reads for those blocks you don't have in memory - that's the
> index cache.
> >>
> >> Now, if you create a new index on an existing table, you are first
> of all bypassing any index read operations - there *is* no index to
> read, yet. Then the system is going to do a full tablescan - considered
> slow, but you need all the data, so there's no better way anyway. The
> index will be built - in-memory as much as possible - and the system
> will automatically prefer to write only complete blocks - 10.000 of
> them. That's the exact same number of index blocks, but you only write
> each block once, so that's only 10.000 writes instead of 1.000.000.
> >>
> >> Now there's a lot more at play, things like B-tree balancing and
> whatnot, but that's the basic picture.
> >>
> >> --
> >>
> >> Bier met grenadyn
> >> Is als mosterd by den wyn
> >> Sy die't drinkt, is eene kwezel
> >> Hy die't drinkt, is ras een ezel
> >
> >
> > --
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe:    http://lists.mysql.com/mysql
> 
> 
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql

Thread
Why is creating indexes faster after inserting massive data rows?Zhangzhigang7 May
  • Re: Why is creating indexes faster after inserting massive data rows?Ananda Kumar7 May
    • 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang7 May
  • Re: Why is creating indexes faster after inserting massive data rows?Johan De Meersman7 May
    • 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang7 May
      • Re: 回复: Why is creating indexes faster after inserting massive data rows?Alex Schaft7 May
        • 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang7 May
          • Re: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Claudio Nanni7 May
            • too nice not to share it!hsv2 Oct
          • Re: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Mihamina Rakotomandimby30 May
      • Re: 回复: Why is creating indexes faster after inserting massive data rows?Claudio Nanni7 May
        • 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang7 May
      • Re: 回复: Why is creating indexes faster after inserting massive data rows?Johan De Meersman7 May
        • RE: 回复: Why is creating indexes faster after inserting massive data rows?Rick James7 May
        • Re: 回复: Why is creating indexes faster after inserting massive data rows?Karen Abgarian7 May
          • 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang8 May
            • Re: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Karen Abgarian8 May
              • 回复: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang8 May
                • Re: 回复: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Johan De Meersman8 May
                  • 回复: 回复: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang8 May
                  • Re: 回复: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Karen Abgarian8 May
                    • 回复: 回复: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang9 May
              • RE: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Rick James10 May
            • Re: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Johan De Meersman8 May
        • Re: 回复: Why is creating indexes faster after inserting massive data rows?Karen Abgarian7 May
    • RE: Why is creating indexes faster after inserting massive datarows?Rick James7 May
      • 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang9 May
        • RE: 回复: Why is creating indexes faster after inserting massive data rows?Rick James9 May
          • Re: 回复: Why is creating indexes faster after inserting massive data rows?Claudio Nanni9 May
          • 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang10 May