From: Karen Abgarian Date: May 8 2012 3:37am Subject: =?utf-8?Q?Re=3A_=E5=9B=9E=E5=A4=8D=EF=BC=9A_=E5=9B=9E=E5=A4=8D?= =?utf-8?Q?=EF=BC=9A_Why_is_creating_indexes_faster_after_inserti?= =?utf-8?Q?ng_massive_data_rows=3F?= List-Archive: http://lists.mysql.com/mysql/227349 Message-Id: <29AFA7F1-4CAF-4FF3-93E0-5995E43A2358@apple.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Honestly, I did not understand that. I did not say anything about = being complicated. What does mysql not use, caching?? Judging by experience, creating a unique index on say, a 200G table = could be a bitter one. =20 On 07.05.2012, at 19:26, Zhangzhigang wrote: > Karen... >=20 > The mysql does not use this approach what you said which is = complicated. >=20 > I agree with ohan De Meersman. >=20 >=20 > ________________________________ > =E5=8F=91=E4=BB=B6=E4=BA=BA=EF=BC=9A Karen Abgarian > =E6=94=B6=E4=BB=B6=E4=BA=BA=EF=BC=9A mysql@stripped=20 > =E5=8F=91=E9=80=81=E6=97=A5=E6=9C=9F=EF=BC=9A 2012=E5=B9=B45=E6=9C=888=E6= =97=A5, =E6=98=9F=E6=9C=9F=E4=BA=8C, =E4=B8=8A=E5=8D=88 1:30 > =E4=B8=BB=E9=A2=98: Re: =E5=9B=9E=E5=A4=8D=EF=BC=9A Why is creating = indexes faster after inserting massive data rows? >=20 > Hi,=20 >=20 > A couple cents to this.=20 >=20 > There isn't really a million of block writes. The record gets added = to the block, but that gets modified in OS cache if we assume MyISAM = tables and in the Innodb buffer if we assume InnoDB tables. In both = cases, the actual writing does not take place and does not slow down the = process. What does however happen for each operation, is processing = the statement, locating the entries to update in the index, index block = splits and , for good reason, committing. =20 >=20 > When it comes to creating an index, what needs to happen, is to read = the whole table and to sort all rows by the index key. The latter = process will be the most determining factor in answering the original = question, because for the large tables the sort will have to do a lot of = disk I/O. The point I am trying to make is there will be situations = when creating indexes and then inserting the rows will be faster than = creating an index afterwards. If we try to determine such situations, = we could notice that the likelihood of the sort going to disk increases = with the amount of distinct values to be sorted. For this reason, my = choice would be to create things like primary/unique keys beforehand = unless I am certain that everything will fit in the available memory.=20 >=20 > Peace > Karen >=20 >=20 >=20 > On May 7, 2012, at 8:05 AM, Johan De Meersman wrote: >=20 >> ----- Original Message ----- >>=20 >>> From: "Zhangzhigang" >>=20 >>> Ok, Creating the index *after* the inserts, the index gets created = in >>> a single operation. >>> But the indexes has to be updating row by row after the data rows = has >>> all been inserted. Does it work in this way? >> No, when you create an index on an existing table (like after a mass = insert), what happens is that the engine does a single full tablescan = and builds the index in a single pass, which is a lot more performant = than updating a single disk block for every record, for the simple = reason that a single disk block can contain dozens of index entries.=20 >>=20 >> Imagine that you insert one million rows, and you have 100 index = entries in a disk block (random numbers, to make a point. Real numbers = will depend on storage, file system, index, et cetera). Obviously = there's no way to write less than a single block to disk - that's how it = works.=20 >>=20 >> You can update your index for each record in turn. That means you = will need to do 1 million index - and thus block - writes; plus = additional reads for those blocks you don't have in memory - that's the = index cache.=20 >>=20 >> Now, if you create a new index on an existing table, you are first of = all bypassing any index read operations - there *is* no index to read, = yet. Then the system is going to do a full tablescan - considered slow, = but you need all the data, so there's no better way anyway. The index = will be built - in-memory as much as possible - and the system will = automatically prefer to write only complete blocks - 10.000 of them. = That's the exact same number of index blocks, but you only write each = block once, so that's only 10.000 writes instead of 1.000.000.=20 >>=20 >> Now there's a lot more at play, things like B-tree balancing and = whatnot, but that's the basic picture.=20 >>=20 >> --=20 >>=20 >> Bier met grenadyn=20 >> Is als mosterd by den wyn=20 >> Sy die't drinkt, is eene kwezel=20 >> Hy die't drinkt, is ras een ezel=20 >=20 >=20 > --=20 > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/mysql