From: Zhangzhigang Date: May 8 2012 6:44am Subject: =?utf-8?B?5Zue5aSN77yaIOWbnuWkje+8miDlm57lpI3vvJogV2h5IGlzIGNyZWF0aW5n?= =?utf-8?B?IGluZGV4ZXMgZmFzdGVyIGFmdGVyIGluc2VydGluZyBtYXNzaXZlIGRhdGEg?= =?utf-8?B?cm93cz8=?= List-Archive: http://lists.mysql.com/mysql/227350 Message-Id: <1336459456.42125.YahooMailNeo@web15205.mail.cnb.yahoo.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="-381987588-1803807866-1336459456=:42125" ---381987588-1803807866-1336459456=:42125 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Ok, OS cache.=0A> There isn't really a million of block writes.=C2=A0 The = record gets =0Aadded to the block, but that gets modified in OS cache if we= assume =0AMyISAM tables and in the Innodb buffer if we assume InnoDB table= s.=0A=0AAs i known, the mysql writes the data to disk directly but does not= use the Os cache when the table is updating.=0A=0AIf it writes to the Os c= ache, which leads to massive system invoking, when the table is inserted a = lot of rows one by one. =0A=0A=0A=0A=0A________________________________=0A = =E5=8F=91=E4=BB=B6=E4=BA=BA=EF=BC=9A Karen Abgarian =0A=E6= =94=B6=E4=BB=B6=E4=BA=BA=EF=BC=9A mysql@stripped =0A=E5=8F=91=E9=80= =81=E6=97=A5=E6=9C=9F=EF=BC=9A 2012=E5=B9=B45=E6=9C=888=E6=97=A5, =E6=98=9F= =E6=9C=9F=E4=BA=8C, =E4=B8=8A=E5=8D=88 11:37=0A=E4=B8=BB=E9=A2=98: Re: =E5= =9B=9E=E5=A4=8D=EF=BC=9A =E5=9B=9E=E5=A4=8D=EF=BC=9A Why is creating indexe= s faster after inserting massive data rows?=0A =0AHonestly, I did not under= stand that.=C2=A0 I did not say anything about being complicated.=C2=A0 Wh= at does mysql not use, caching??=0A=0AJudging by experience, creating a uni= que index on say, a 200G table could be a bitter one.=C2=A0 =0A=0A=0AOn 07.= 05.2012, at 19:26, Zhangzhigang wrote:=0A=0A> Karen...=0A> =0A> The mysql d= oes not use this approach what you said which is complicated.=0A> =0A> I=C2= =A0 agree with ohan De Meersman.=0A> =0A> =0A> ____________________________= ____=0A> =E5=8F=91=E4=BB=B6=E4=BA=BA=EF=BC=9A Karen Abgarian =0A> =E6=94=B6=E4=BB=B6=E4=BA=BA=EF=BC=9A mysql@stripped =0A> =E5= =8F=91=E9=80=81=E6=97=A5=E6=9C=9F=EF=BC=9A 2012=E5=B9=B45=E6=9C=888=E6=97= =A5, =E6=98=9F=E6=9C=9F=E4=BA=8C, =E4=B8=8A=E5=8D=88 1:30=0A> =E4=B8=BB=E9= =A2=98: Re: =E5=9B=9E=E5=A4=8D=EF=BC=9A Why is creating indexes faster afte= r inserting massive data rows?=0A> =0A> Hi, =0A> =0A> A couple cents to thi= s. =0A> =0A> There isn't really a million of block writes.=C2=A0 The recor= d gets added to the block, but that gets modified in OS cache if we assume = MyISAM tables and in the Innodb buffer if we assume InnoDB tables.=C2=A0 I= n both cases, the actual writing does not take place and does not slow down= the process.=C2=A0 =C2=A0 What does however happen for each operation, is = processing the statement, locating the entries to update in the index, inde= x block splits and , for good reason, committing.=C2=A0 =0A> =0A> When it c= omes to creating an index, what needs to happen, is to read the whole table= and to sort all rows by the index key.=C2=A0 The latter process will be t= he most determining factor in answering the original question, because for = the large tables the sort will have to do a lot of disk I/O.=C2=A0 =C2=A0 T= he point I am trying to make is there will be situations when creating inde= xes and then inserting the rows will be faster than creating an index after= wards.=C2=A0 If we try to determine such situations, we could notice that = the likelihood of the sort going to disk increases with the amount of disti= nct values to be sorted.=C2=A0 For this reason, my choice would be to crea= te things like primary/unique keys beforehand unless I am certain that ever= ything will fit in the available memory. =0A> =0A> Peace=0A> Karen=0A> =0A>= =0A> =0A> On May 7, 2012, at 8:05 AM, Johan De Meersman wrote:=0A> =0A>> -= ---- Original Message -----=0A>> =0A>>> From: "Zhangzhigang" =0A>> =0A>>> Ok, Creating the index *after* the inserts, the in= dex gets created in=0A>>> a single operation.=0A>>> But the indexes has to = be updating row by row after the data rows has=0A>>> all been inserted. Doe= s it work in this way?=0A>> No, when you create an index on an existing tab= le (like after a mass insert), what happens is that the engine does a singl= e full tablescan and builds the index in a single pass, which is a lot more= performant than updating a single disk block for every record, for the sim= ple reason that a single disk block can contain dozens of index entries. = =0A>> =0A>> Imagine that you insert one million rows, and you have 100 inde= x entries in a disk block (random numbers, to make a point. Real numbers wi= ll depend on storage, file system, index, et cetera). Obviously there's no = way to write less than a single block to disk - that's how it works. =0A>> = =0A>> You can update your index for each record in turn. That means you wil= l need to do 1 million index - and thus block - writes; plus additional rea= ds for those blocks you don't have in memory - that's the index cache. =0A>= > =0A>> Now, if you create a new index on an existing table, you are first = of all bypassing any index read operations - there *is* no index to read, y= et. Then the system is going to do a full tablescan - considered slow, but = you need all the data, so there's no better way anyway. The index will be b= uilt - in-memory as much as possible - and the system will automatically pr= efer to write only complete blocks - 10.000 of them. That's the exact same = number of index blocks, but you only write each block once, so that's only = 10.000 writes instead of 1.000.000. =0A>> =0A>> Now there's a lot more at p= lay, things like B-tree balancing and whatnot, but that's the basic picture= . =0A>> =0A>> -- =0A>> =0A>> Bier met grenadyn =0A>> Is als mosterd by den = wyn =0A>> Sy die't drinkt, is eene kwezel =0A>> Hy die't drinkt, is ras een= ezel =0A> =0A> =0A> -- =0A> MySQL General Mailing List=0A> For list archiv= es: http://lists.mysql.com/mysql=0A> To unsubscribe:=C2=A0 =C2=A0 http://li= sts.mysql.com/mysql=0A=0A=0A--=0AMySQL General Mailing List=0AFor list arch= ives: http://lists.mysql.com/mysql=0ATo unsubscribe:=C2=A0 =C2=A0 http://li= sts.mysql.com/mysql ---381987588-1803807866-1336459456=:42125--