List:General Discussion« Previous MessageNext Message »
From:Zhangzhigang Date:May 10 2012 4:59am
Subject:回复: 回复: Why is creating indexes f
aster after inserting massive data rows?
View as plain text  
>The “output” from the sortmerge is fed into code that builds the BTree for
> 
the table.  This building of the BTree is sequential – fill the first 
block,
> move on to the next block, and never have to go back.

 James...


Thanks for your
> answer, so clearly.

Firstly:

I thought that the "block split" for building of the BTree
> has to been done to do random I/O before accepting this answer.

Now, i have known that
> the mysql do the optimization to keep from "block split" by "sort merge" for building
> BTree, so it does not do more "random" I/O.


Secondly:

It bypass BTree traversals, When
> the index are too big to be cached which involves disk hit(s)  fro each row
> inserted.


Thank you very much.


Sincerely yours
Zhigang
> Zhang


________________________________
 发件人: Rick James
> <rjames@stripped>
收件人: Zhangzhigang <zzgang_2008@stripped>
> 
抄送: "mysql@stripped" <mysql@stripped> 
发送日期:
> 2012年5月9日, 星期三, 下午 11:21
主题: RE: 回复: Why is creating indexes
> faster after inserting massive data rows?
 

A BTree that is small enough to be cached in
> RAM can be quickly maintained.  Even the “block splits” are not too
> costly without the I/O.
 
A big file that needs sorting – bigger than can be
> cached in RAM – is more efficiently done with a dedicated “sort merge”
> program.  A “big” INDEX on a table may be big enough to fall into this
> category.
 
I/O is the most costly part of any of these operations.  My rule of
> thumb for MySQL SQL statements is:  If everything is cached, the query will run ten
> times as fast as it would if things have to be fetched from disk.
 
Sortmerge works
> this way:
1.       Sort as much of the file as you can in
> RAM.  Write that sorted piece to disk.
2.       Repeat
> for the next chunk of the file.  Repeat until the input file is broken into sorted
> chunks.
3.       Now, “merge” those chunks
> together – take the first row from  each, decide which is the
> “smallest”, send it to the output
4.      
> Repeat until finished with all the pieces.
For a really big task, there may have to be
> more than on “merge” pass.
Note how sort merge reads the input sequentially
> once, writes the output sequentially once, and has sequential I/O for each merge
> chunk.
“Sequential” I/O is faster than “random” I/O – no arm
> motion on traditional disks.  (SSDs are a different matter; I won’t go into
> that.)
 
The “output” from the sortmerge is fed into code that builds the
> BTree for the table.  This building of the BTree is sequential – fill the first
> block, move on to the next block, and never have to go back.
 
BTrees (when built
> randomly), if they need to spill to disk, will involve random I/O.  (And we are
> talking about an INDEX that is so big that it needs to spill to disk.)
 
When a block
> “splits”, one full block becomes two half-full blocks.  Randomly filling
> a BTree leads to, on average, the index being 69% full.  This is not a big factor in
> the overall issue, but perhaps worth noting.
 
How bad can it get?  Here’s
> an example.
·         You have an INDEX on
> some random value, such as a GUID or
> MD5.
·         The INDEX will be 5 times as
> big as you can fit in RAM.
·         MySQL
> is adding to the BTree one row at a time (the non-sortmerge way)
When it is nearly
> finished, only 1 of 5 updates to the BTree can be done immediately in RAM; 4 out of 5
> updates to the BTree will have to hit disk.  If you are using normal disks, that is
> on the order of 125 rows per second that you can insert – Terrible!  Sortmerge
> is likely to average over 10,000.
 
 
 
From:Zhangzhigang
> [mailto:zzgang_2008@stripped] 
Sent: Tuesday, May 08, 2012 9:13 PM
To: Rick James
Cc:
> mysql@stripped
Subject: 回复:Why is creating indexes faster after inserting
> massive data rows?
 
James...
>* By doing all the indexes after building the table
> (or at least all the non-UNIQUE indexes), "sort merge" can be used.  This technique
> had been highly optimized over the past half-century, and is more efficient.
 
I have
> a question about "sort merge":
 
Why does it do the all "sort merge"? 
 
In my
> opinion, it just maintains the B tree and inserts one key into a B tree node which has
> fewer sorted keys, so it is good performance.
 
If it only does the "sort merge", the
> B tree data structure have to been created separately. it wastes some
> performance.
 
Does
> it?
 
 

________________________________

发件人:Rick James
> <rjames@stripped>
收件人:Johan De Meersman <vegivamp@stripped>;
> Zhangzhigang <zzgang_2008@stripped> 
抄送:"mysql@stripped"
> <mysql@stripped> 
发送日期:2012年5月8日, 星期二,
> 上午12:35
主题:RE: Why is creating indexes faster after inserting massive data
> rows?

* Batch INSERTs run faster than one-row-at-a-time, but this is unrelated to INDEX
> updating speed.
* The cache size is quite important to dealing with indexing during
> INSERT; see http://mysql.rjweb.org/doc.php/memory 
* Note that mysqldump sets up for an
> efficient creation of indexes after loading the data.  This is not practical (or
> necessarily efficient) when incremental INSERTing into a table.

As for the original
> question...
* Updating the index(es) for one row often involves random BTree
> traversals.  When the index(es) are too big to be cached, this can involve disk
> hit(s) for each row inserted.
* By doing all the indexes after building the table (or at
> least all the non-UNIQUE indexes), "sort merge" can be used.  This technique had been
> highly optimized over the past half-century, and is more efficient.


> -----Original Message-----
> From: Johan De Meersman [mailto:vegivamp@stripped]
> Sent: Monday, May 07, 2012 1:29 AM
> To: Zhangzhigang
> Cc: mysql@stripped
> Subject: Re: Why is creating indexes faster after inserting massive
> data rows?
> 
> ----- Original Message -----
> > From: "Zhangzhigang" <zzgang_2008@stripped>
> >
> > Creating indexes after inserting massive data rows is faster than
> > before inserting data rows.
> > Please tell me why.
> 
> Plain and simple: the indices get updated after every insert statement,
> whereas if you only create the index *after* the inserts, the index
> gets created in a single operation, which is a lot more efficient.
> 
> I seem to recall that inside of a transaction (thus, InnoDB or so) the
> difference is markedly less; I might be wrong, though.
> 
> 
> --
> Bier met grenadyn
> Is als mosterd by den wyn
> Sy die't drinkt, is eene kwezel
> Hy die't drinkt, is ras een ezel
> 
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql 
> To unsubscribe:    http://lists.mysql.com/mysql 
Thread
Why is creating indexes faster after inserting massive data rows?Zhangzhigang7 May
  • Re: Why is creating indexes faster after inserting massive data rows?Ananda Kumar7 May
    • 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang7 May
  • Re: Why is creating indexes faster after inserting massive data rows?Johan De Meersman7 May
    • 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang7 May
      • Re: 回复: Why is creating indexes faster after inserting massive data rows?Alex Schaft7 May
        • 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang7 May
          • Re: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Claudio Nanni7 May
            • too nice not to share it!hsv2 Oct
          • Re: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Mihamina Rakotomandimby30 May
      • Re: 回复: Why is creating indexes faster after inserting massive data rows?Claudio Nanni7 May
        • 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang7 May
      • Re: 回复: Why is creating indexes faster after inserting massive data rows?Johan De Meersman7 May
        • RE: 回复: Why is creating indexes faster after inserting massive data rows?Rick James7 May
        • Re: 回复: Why is creating indexes faster after inserting massive data rows?Karen Abgarian7 May
          • 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang8 May
            • Re: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Karen Abgarian8 May
              • 回复: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang8 May
                • Re: 回复: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Johan De Meersman8 May
                  • 回复: 回复: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang8 May
                  • Re: 回复: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Karen Abgarian8 May
                    • 回复: 回复: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang9 May
              • RE: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Rick James10 May
            • Re: 回复: 回复: Why is creating indexes faster after inserting massive data rows?Johan De Meersman8 May
        • Re: 回复: Why is creating indexes faster after inserting massive data rows?Karen Abgarian7 May
    • RE: Why is creating indexes faster after inserting massive datarows?Rick James7 May
      • 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang9 May
        • RE: 回复: Why is creating indexes faster after inserting massive data rows?Rick James9 May
          • Re: 回复: Why is creating indexes faster after inserting massive data rows?Claudio Nanni9 May
          • 回复: 回复: Why is creating indexes faster after inserting massive data rows?Zhangzhigang10 May