| List: | General Discussion | « Previous MessageNext Message » | |
| From: | Karen Abgarian | Date: | May 8 2012 3:37am |
| Subject: | Re: 回复: 回复 : Why is creating indexes faster after inserti ng massive data rows? | ||
| View as plain text | |||
Honestly, I did not understand that. I did not say anything about being complicated. What does mysql not use, caching?? Judging by experience, creating a unique index on say, a 200G table could be a bitter one. On 07.05.2012, at 19:26, Zhangzhigang wrote: > Karen... > > The mysql does not use this approach what you said which is complicated. > > I agree with ohan De Meersman. > > > ________________________________ > 发件人: Karen Abgarian <abvk@stripped> > 收件人: mysql@stripped > 发送日期: 2012年5月8日, 星期二, 上午 1:30 > 主题: Re: 回复: Why is creating indexes faster after inserting massive data > rows? > > Hi, > > A couple cents to this. > > There isn't really a million of block writes. The record gets added to the block, > but that gets modified in OS cache if we assume MyISAM tables and in the Innodb buffer if > we assume InnoDB tables. In both cases, the actual writing does not take place and does > not slow down the process. What does however happen for each operation, is processing > the statement, locating the entries to update in the index, index block splits and , for > good reason, committing. > > When it comes to creating an index, what needs to happen, is to read the whole table > and to sort all rows by the index key. The latter process will be the most determining > factor in answering the original question, because for the large tables the sort will have > to do a lot of disk I/O. The point I am trying to make is there will be situations when > creating indexes and then inserting the rows will be faster than creating an index > afterwards. If we try to determine such situations, we could notice that the likelihood > of the sort going to disk increases with the amount of distinct values to be sorted. For > this reason, my choice would be to create things like primary/unique keys beforehand > unless I am certain that everything will fit in the available memory. > > Peace > Karen > > > > On May 7, 2012, at 8:05 AM, Johan De Meersman wrote: > >> ----- Original Message ----- >> >>> From: "Zhangzhigang" <zzgang_2008@stripped> >> >>> Ok, Creating the index *after* the inserts, the index gets created in >>> a single operation. >>> But the indexes has to be updating row by row after the data rows has >>> all been inserted. Does it work in this way? >> No, when you create an index on an existing table (like after a mass insert), > what happens is that the engine does a single full tablescan and builds the index in a > single pass, which is a lot more performant than updating a single disk block for every > record, for the simple reason that a single disk block can contain dozens of index > entries. >> >> Imagine that you insert one million rows, and you have 100 index entries in a > disk block (random numbers, to make a point. Real numbers will depend on storage, file > system, index, et cetera). Obviously there's no way to write less than a single block to > disk - that's how it works. >> >> You can update your index for each record in turn. That means you will need to do > 1 million index - and thus block - writes; plus additional reads for those blocks you > don't have in memory - that's the index cache. >> >> Now, if you create a new index on an existing table, you are first of all > bypassing any index read operations - there *is* no index to read, yet. Then the system is > going to do a full tablescan - considered slow, but you need all the data, so there's no > better way anyway. The index will be built - in-memory as much as possible - and the > system will automatically prefer to write only complete blocks - 10.000 of them. That's > the exact same number of index blocks, but you only write each block once, so that's only > 10.000 writes instead of 1.000.000. >> >> Now there's a lot more at play, things like B-tree balancing and whatnot, but > that's the basic picture. >> >> -- >> >> Bier met grenadyn >> Is als mosterd by den wyn >> Sy die't drinkt, is eene kwezel >> Hy die't drinkt, is ras een ezel > > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/mysql
