From: Johan De Meersman Date: May 7 2012 3:05pm Subject: Re: =?utf-8?B?5Zue5aSN77ya?= Why is creating indexes faster after inserting massive data rows? List-Archive: http://lists.mysql.com/mysql/227342 Message-Id: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="=_923defb5-67d8-48d8-8cb6-2b59fae001b8" --=_923defb5-67d8-48d8-8cb6-2b59fae001b8 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit ----- Original Message ----- > From: "Zhangzhigang" > Ok, Creating the index *after* the inserts, the index gets created in > a single operation. > But the indexes has to be updating row by row after the data rows has > all been inserted. Does it work in this way? No, when you create an index on an existing table (like after a mass insert), what happens is that the engine does a single full tablescan and builds the index in a single pass, which is a lot more performant than updating a single disk block for every record, for the simple reason that a single disk block can contain dozens of index entries. Imagine that you insert one million rows, and you have 100 index entries in a disk block (random numbers, to make a point. Real numbers will depend on storage, file system, index, et cetera). Obviously there's no way to write less than a single block to disk - that's how it works. You can update your index for each record in turn. That means you will need to do 1 million index - and thus block - writes; plus additional reads for those blocks you don't have in memory - that's the index cache. Now, if you create a new index on an existing table, you are first of all bypassing any index read operations - there *is* no index to read, yet. Then the system is going to do a full tablescan - considered slow, but you need all the data, so there's no better way anyway. The index will be built - in-memory as much as possible - and the system will automatically prefer to write only complete blocks - 10.000 of them. That's the exact same number of index blocks, but you only write each block once, so that's only 10.000 writes instead of 1.000.000. Now there's a lot more at play, things like B-tree balancing and whatnot, but that's the basic picture. -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel --=_923defb5-67d8-48d8-8cb6-2b59fae001b8--