Hi!
On Oct 05, Hagen H?pfner wrote:
> Dear list,
>
> I thought that I understood the 2-tier structure of the fulltext index
> but as longer as I think about it ... You know ;-)
>
> Let me try to explain and let us first assume, that it needs less
> space to use the 2-tier-b-tree anyway. Let us further assume, that we
> have a ft-index that indexes the three words "dreary", "raven" and
> "midnight" ... The world midnight appears in 3 documents and the other
> words appear only one times. I assume, that the 1-level of our tree
> (order 1) now looks like this:
>
>
> midnight, 3, [p1]
> / \
> dreary, 1, [p2] raven, 1, [p3]
>
> Here, [p1]...[p2] are pointers to the second tear, which also includes
> btrees. But what exacly is stored in the second level. Is it one tree
> for each word? If so, we would have in this example 3 B-trees in the
> second tier, first one starting at (root) [p1], second at [p2] and
> third at [p3]. I now take a look at the second-tier-b-trie starting at
> [p1]:
>
> [p1]-> w-for-midnight-doc1, [d1]
> / \
> w-for-midnight-doc2, [d2] w-for-midnight-in-3, [d3]
>
> The [d]s are links to the datasets in the data file, are'nt they?
>
> For space-optimization reasonse Sergei wrote, that the first-tier tree
> might also include direct references to the data file. Let's now
> assume, that this happens for the nodes that represent
> single-document-strings. Then we would get the following
> tier-1-b-tree:
>
> midnight, 3, [p1] ______
> / \
> dreary, w-f-dreary, [d5] raven, w-f-raven, [d6]
>
> Am I right?
>
> If I explained the approach properly then just send me an YES, YOU DID
> IT ;-). Otherwise, please clarify the approach. BTW: where can I find
YES, YOU DID IT
> (in the source code) the functions that compares the space consumption
> of a node?
Two places.
In repair-by-sort process, the index is created as a whole, at once.
sort_ft_key_write() is called for the every word/document (d# in your
text above) pair, and words come already sorted (this is how
repair-by-sort works). The condition you're looking for is right above
the comment /* converting to two-level tree */.
In regular insert, MyISAM uses _mi_insert() function to write the key
value to key page, the code you're looking for is between "Let's
consider converting' comment and /* yup. converting */
It won't be easy to see, probably, but the logic is simple: 2-level tree
takes at least one full keypage (a root page of the second level tree).
Thus, a word is converted to a 2-level if it takes more than one keypage
as a 1-level tree.
Regards,
Sergei
--
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Sergei Golubchik <serg@stripped>
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, Senior Software Developer
/_/ /_/\_, /___/\___\_\___/ Kerpen, Germany
<___/ www.mysql.com