----- Original Message -----
> From: "Zhangzhigang" <zzgang_2008@stripped>
> Ok, Creating the index *after* the inserts, the index gets created in
> a single operation.
> But the indexes has to be updating row by row after the data rows has
> all been inserted. Does it work in this way?
No, when you create an index on an existing table (like after a mass insert), what happens
is that the engine does a single full tablescan and builds the index in a single pass,
which is a lot more performant than updating a single disk block for every record, for
the simple reason that a single disk block can contain dozens of index entries.
Imagine that you insert one million rows, and you have 100 index entries in a disk block
(random numbers, to make a point. Real numbers will depend on storage, file system,
index, et cetera). Obviously there's no way to write less than a single block to disk -
that's how it works.
You can update your index for each record in turn. That means you will need to do 1
million index - and thus block - writes; plus additional reads for those blocks you don't
have in memory - that's the index cache.
Now, if you create a new index on an existing table, you are first of all bypassing any
index read operations - there *is* no index to read, yet. Then the system is going to do
a full tablescan - considered slow, but you need all the data, so there's no better way
anyway. The index will be built - in-memory as much as possible - and the system will
automatically prefer to write only complete blocks - 10.000 of them. That's the exact
same number of index blocks, but you only write each block once, so that's only 10.000
writes instead of 1.000.000.
Now there's a lot more at play, things like B-tree balancing and whatnot, but that's the
basic picture.
--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel