List:Internals« Previous MessageNext Message »
From:Sergei Golubchik Date:October 5 2005 8:33pm
Subject:Re: another short question
View as plain text  
Hi!

On Oct 05, Hagen H?pfner wrote:
> Dear list,
> 
> I thought that I understood the 2-tier structure of the fulltext index
> but as longer as I think about it  ... You know ;-)
> 
> Let me try to explain and let us first assume, that it needs less
> space to use the 2-tier-b-tree anyway. Let us further assume, that we
> have a ft-index that indexes the three words "dreary", "raven" and
> "midnight" ... The world midnight appears in 3 documents and the other
> words appear only one times. I assume, that the 1-level of our tree
> (order 1) now looks like this:
>  
>        
>    midnight, 3, [p1]
>   /                \
> dreary, 1, [p2]     raven, 1, [p3]
> 
> Here, [p1]...[p2] are pointers to the second tear, which also includes
> btrees. But what exacly is stored in the second level. Is it one tree
> for each word? If so, we would have in this example 3 B-trees in the
> second tier, first one starting at (root) [p1], second at [p2] and
> third at [p3]. I now take a look at the second-tier-b-trie starting at
> [p1]:
> 
>   [p1]->  w-for-midnight-doc1, [d1]
>          /                       \
> w-for-midnight-doc2, [d2]        w-for-midnight-in-3, [d3]
> 
> The [d]s are links to the datasets in the data file, are'nt they?
> 
> For space-optimization reasonse Sergei wrote, that the first-tier tree
> might also include direct references to the data file. Let's now
> assume, that this happens for the nodes that represent
> single-document-strings.  Then we would get the following
> tier-1-b-tree:
> 
>    midnight, 3, [p1] ______
>   /                        \
> dreary, w-f-dreary, [d5]   raven, w-f-raven, [d6]
> 
> Am I right?
> 
> If I explained the approach properly then just send me an YES, YOU DID
> IT ;-). Otherwise, please clarify the approach. BTW: where can I find

YES, YOU DID IT

> (in the source code) the functions that compares the space consumption
> of a node?

Two places.

In repair-by-sort process, the index is created as a whole, at once.
sort_ft_key_write() is called for the every word/document (d# in your
text above) pair, and words come already sorted (this is how
repair-by-sort works). The condition you're looking for is right above
the comment /* converting to two-level tree */.

In regular insert, MyISAM uses _mi_insert() function to write the key
value to key page, the code you're looking for is between "Let's
consider converting' comment and /* yup. converting */

It won't be easy to see, probably, but the logic is simple: 2-level tree
takes at least one full keypage (a root page of the second level tree).
Thus, a word is converted to a 2-level if it takes more than one keypage
as a 1-level tree.
 
Regards,
Sergei

-- 
   __  ___     ___ ____  __
  /  |/  /_ __/ __/ __ \/ /   Sergei Golubchik <serg@stripped>
 / /|_/ / // /\ \/ /_/ / /__  MySQL AB, Senior Software Developer
/_/  /_/\_, /___/\___\_\___/  Kerpen, Germany
       <___/  www.mysql.com
Thread
another short questionHagen Höpfner28 Sep
  • Re: another short questionSergei Golubchik28 Sep
    • Re: another short questionHagen Höpfner5 Oct
      • Re: another short questionSergei Golubchik5 Oct