From: Zhangzhigang Date: May 8 2012 2:26am Subject: =?utf-8?B?5Zue5aSN77yaIOWbnuWkje+8miBXaHkgaXMgY3JlYXRpbmcgaW5kZXhlcyBm?= =?utf-8?B?YXN0ZXIgYWZ0ZXIgaW5zZXJ0aW5nIG1hc3NpdmUgZGF0YSByb3dzPw==?= List-Archive: http://lists.mysql.com/mysql/227348 Message-Id: <1336443988.56860.YahooMailNeo@web15203.mail.cnb.yahoo.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="-608863550-283621094-1336443988=:56860" ---608863550-283621094-1336443988=:56860 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Karen...=0A=0AThe mysql does not use this approach what you said which is c= omplicated.=0A=0AI=C2=A0 agree with ohan De Meersman.=0A=0A=0A_____________= ___________________=0A =E5=8F=91=E4=BB=B6=E4=BA=BA=EF=BC=9A Karen Abgarian = =0A=E6=94=B6=E4=BB=B6=E4=BA=BA=EF=BC=9A mysql@stripped= om =0A=E5=8F=91=E9=80=81=E6=97=A5=E6=9C=9F=EF=BC=9A 2012=E5=B9=B45=E6=9C=88= 8=E6=97=A5, =E6=98=9F=E6=9C=9F=E4=BA=8C, =E4=B8=8A=E5=8D=88 1:30=0A=E4=B8= =BB=E9=A2=98: Re: =E5=9B=9E=E5=A4=8D=EF=BC=9A Why is creating indexes faste= r after inserting massive data rows?=0A =0AHi, =0A=0AA couple cents to this= . =0A=0AThere isn't really a million of block writes.=C2=A0 The record get= s added to the block, but that gets modified in OS cache if we assume MyISA= M tables and in the Innodb buffer if we assume InnoDB tables.=C2=A0 In bot= h cases, the actual writing does not take place and does not slow down the = process.=C2=A0 =C2=A0 What does however happen for each operation, is proce= ssing the statement, locating the entries to update in the index, index blo= ck splits and , for good reason, committing.=C2=A0 =0A=0AWhen it comes to c= reating an index, what needs to happen, is to read the whole table and to s= ort all rows by the index key.=C2=A0 The latter process will be the most d= etermining factor in answering the original question, because for the large= tables the sort will have to do a lot of disk I/O.=C2=A0 =C2=A0 The point = I am trying to make is there will be situations when creating indexes and t= hen inserting the rows will be faster than creating an index afterwards.=C2= =A0 If we try to determine such situations, we could notice that the likel= ihood of the sort going to disk increases with the amount of distinct value= s to be sorted.=C2=A0 For this reason, my choice would be to create things= like primary/unique keys beforehand unless I am certain that everything wi= ll fit in the available memory. =0A=0APeace=0AKaren=0A=0A=0A=0AOn May 7, 20= 12, at 8:05 AM, Johan De Meersman wrote:=0A=0A> ----- Original Message ----= -=0A> =0A>> From: "Zhangzhigang" =0A> =0A>> Ok, C= reating the index *after* the inserts, the index gets created in=0A>> a sin= gle operation.=0A>> But the indexes has to be updating row by row after the= data rows has=0A>> all been inserted. Does it work in this way?=0A> No, wh= en you create an index on an existing table (like after a mass insert), wha= t happens is that the engine does a single full tablescan and builds the in= dex in a single pass, which is a lot more performant than updating a single= disk block for every record, for the simple reason that a single disk bloc= k can contain dozens of index entries. =0A> =0A> Imagine that you insert on= e million rows, and you have 100 index entries in a disk block (random numb= ers, to make a point. Real numbers will depend on storage, file system, ind= ex, et cetera). Obviously there's no way to write less than a single block = to disk - that's how it works. =0A> =0A> You can update your index for each= record in turn. That means you will need to do 1 million index - and thus = block - writes; plus additional reads for those blocks you don't have in me= mory - that's the index cache. =0A> =0A> Now, if you create a new index on = an existing table, you are first of all bypassing any index read operations= - there *is* no index to read, yet. Then the system is going to do a full = tablescan - considered slow, but you need all the data, so there's no bette= r way anyway. The index will be built - in-memory as much as possible - and= the system will automatically prefer to write only complete blocks - 10.00= 0 of them. That's the exact same number of index blocks, but you only write= each block once, so that's only 10.000 writes instead of 1.000.000. =0A> = =0A> Now there's a lot more at play, things like B-tree balancing and whatn= ot, but that's the basic picture. =0A> =0A> -- =0A> =0A> Bier met grenadyn = =0A> Is als mosterd by den wyn =0A> Sy die't drinkt, is eene kwezel =0A> Hy= die't drinkt, is ras een ezel =0A=0A=0A-- =0AMySQL General Mailing List=0A= For list archives: http://lists.mysql.com/mysql=0ATo unsubscribe:=C2=A0 =C2= =A0 http://lists.mysql.com/mysql ---608863550-283621094-1336443988=:56860--