Dear list,
I thought that I understood the 2-tier structure of the fulltext index
but as longer as I think about it ... You know ;-)
Let me try to explain and let us first assume, that it needs less space
to use the 2-tier-b-tree anyway. Let us further assume, that we have a
ft-index that indexes the three words "dreary", "raven" and "midnight"
... The world midnight appears in 3 documents and the other words appear
only one times. I assume, that the 1-level of our tree (order 1) now
looks like this:
midnight, 3, [p1]
/ \
dreary, 1, [p2] raven, 1, [p3]
Here, [p1]...[p2] are pointers to the second tear, which also includes
btrees. But what exacly is stored in the second level. Is it one tree
for each word? If so, we would have in this example 3 B-trees in the
second tier, first one starting at (root) [p1], second at [p2] and third
at [p3]. I now take a look at the second-tier-b-trie starting at [p1]:
[p1]-> w-for-midnight-doc1, [d1]
/ \
w-for-midnight-doc2, [d2] w-for-midnight-in-3, [d3]
The [d]s are links to the datasets in the data file, are'nt they?
For space-optimization reasonse Sergei wrote, that the first-tier tree
might also include direct references to the data file. Let's now assume,
that this happens for the nodes that represent single-document-strings.
Then we would get the following tier-1-b-tree:
midnight, 3, [p1] ______
/ \
dreary, w-f-dreary, [d5] raven, w-f-raven, [d6]
Am I right?
If I explained the approach properly then just send me an YES, YOU DID
IT ;-). Otherwise, please clarify the approach. BTW: where can I find
(in the source code) the functions that compares the space consumption
of a node?
All the best,
Hagen
Sergei Golubchik schrieb:
>Hi!
>
>On Sep 28, Hagen H?pfner wrote:
>
>
>>Looking for information about the "structur" of FT-indexes I found the
>>following page:
>>http://dev.mysql.com/tech-resources/articles/full-text-revealed.html
>>
>>The outhor pointed out, that these information may become obsolete in
>>newer versions. Is it correct, that MySQL 5 also has a simple sorted
>>list of
>>
>>{
>> Word -- VARCHAR. a word within the text.
>> Count -- LONG. how many times word occurs in text.
>>}
>>{
>> Weight -- FLOAT. Our evaluation of the word's importance.
>> Rowid -- a pointer to the row in the data file.
>>}
>>
>>? What is the sorting criteria?
>>
>>
>
>Not exactly. In 4.0 it was a simple btree with a key structure
>
> word VARCHAR(255), weight FLOAT, rowid
>
>where rowid is automatically the last part of any myisam index, not
>something fulltext specific.
>
>In 4.1 it's (what I call a) two-level btree. That is there's a btree or
>keys {word, count}, and every such a key contains a pointer to a root
>of the btree of the structure {weight, rowid}.
>
>Well, it was a simplification. In fact in 4.1 it's a mix of both - some
>nodes of the btree contain word/weight/rowid while others contain
>word/count (format is chosen automatically, depending on what takes less
>space).
>
>In 5.0 the index structure was not changed.
>
>The index is sorted by {word, rowid}, secondary btree is sorted by
>rowid.
>
>Regards,
>Sergei
>
>
>