List:General Discussion« Previous MessageNext Message »
From:Michael Widenius Date:August 27 1999 12:52pm
Subject:Re: Huge index file, why?
View as plain text  
Hi!

<cut>

Jules> Indeed, I understand about btrees.

Jules> (I haven't read the msyql source code,so apologies if I make a fool of
Jules> myself here)

Jules> If I were implementing a btree index for my data, I'd expect each node
Jules> to look like this:

Jules> struct node {
Jules>   node_ptr leftchild;
Jules>   node_ptr rightchild;
Jules>   data     keydata;
Jules>   row_id   rownumber;
Jules> }

MySQL uses the layout:

left_node key_data row_pointer left_node ....   right_node

(This is about the same as your, but a little more efficient)

Jules> (If it were a non-unique index, I'd need all the row_ids, but tests
Jules> reveal that the index which gets huge is (field1,field2), which is
Jules> unique).

Jules> So I'd expect sizeof(node) = sizeof(keydata) + 2 * sizeof (node_ptr) +
Jules> sizeof (row_id).

The above should be about right.

Jules> Since each key is contained in a row, we have for sure sizeof(keydata) <
Jules> sizeof(row) (for each particular row - obviously, some rows are longer
Jules> than others).

Jules> So sizeof(node) < sizeof(row) + 2 * sizeof(node_ptr) +sizeof(row_id).

This isn't necessary right;  In the worst case the key tree pages may be
only 50 % full and there may even be 'lost bytes' if the keys doesn't
fit nicely into a 1024 byte page.

As you have varchar() keys, you also need 2 extra bytes / key to store 
the key part lengths.

Jules> Now node_ptrs and row_ids fit comfortably into 4 bytes..

Jules> Since there is exactly one node for each row, and each node is at worst
Jules> 12 bytes bigger than the corresponding row, I would expect the btree to
Jules> be around (DB size)+(numrows * 12) = 200M + (9M * 12) = about 300M.

Jules> Now I could imagine some other useful state information you might want
Jules> in the btree, which might push that to 4 or 5 hundred M - but I can't
Jules> get to 2G!

With a key size of 160+16, and a page size of 1024 you may in the
worst case only get in 2-3 keys / page.  This will make the .ISM file
a bit bigger (but it shouldn't be 2G in this case)

In MySQL 3.23 (with MyISAM) the key page size may be bigger than 1024
and this will ensure that there is less unused page data for big keys.

Regards,
Monty
Thread
Huge index file, why?Jules Bean26 Aug
  • Huge index file, why?sinisa26 Aug
  • Re: Huge index file, why?Jules Bean26 Aug
    • Re: Huge index file, why?Benjamin Pflugmann26 Aug
      • Re: Huge index file, why?Michael Widenius27 Aug
    • Re: Huge index file, why?Michael Widenius27 Aug
  • Re: Huge index file, why?Jules Bean27 Aug
    • Re: Huge index file, why?Michael Widenius27 Aug
  • Re: Huge index file, why?Scott Hess27 Aug
    • Re: Huge index file, why?Benjamin Pflugmann27 Aug
      • Re: Huge index file, why?Michael Widenius27 Aug
        • Re: Huge index file, why?Benjamin Pflugmann28 Aug
  • Re: Huge index file, why?Jules Bean27 Aug
  • Re: Huge index file, why?Scott Hess27 Aug
    • Re: Huge index file, why?Michael Widenius29 Aug
Re: Huge index file, why?V Yau30 Aug
  • Re: Huge index file, why?Michael Widenius30 Aug