Hi!
<cut>
Jules> Indeed, I understand about btrees.
Jules> (I haven't read the msyql source code,so apologies if I make a fool of
Jules> myself here)
Jules> If I were implementing a btree index for my data, I'd expect each node
Jules> to look like this:
Jules> struct node {
Jules> node_ptr leftchild;
Jules> node_ptr rightchild;
Jules> data keydata;
Jules> row_id rownumber;
Jules> }
MySQL uses the layout:
left_node key_data row_pointer left_node .... right_node
(This is about the same as your, but a little more efficient)
Jules> (If it were a non-unique index, I'd need all the row_ids, but tests
Jules> reveal that the index which gets huge is (field1,field2), which is
Jules> unique).
Jules> So I'd expect sizeof(node) = sizeof(keydata) + 2 * sizeof (node_ptr) +
Jules> sizeof (row_id).
The above should be about right.
Jules> Since each key is contained in a row, we have for sure sizeof(keydata) <
Jules> sizeof(row) (for each particular row - obviously, some rows are longer
Jules> than others).
Jules> So sizeof(node) < sizeof(row) + 2 * sizeof(node_ptr) +sizeof(row_id).
This isn't necessary right; In the worst case the key tree pages may be
only 50 % full and there may even be 'lost bytes' if the keys doesn't
fit nicely into a 1024 byte page.
As you have varchar() keys, you also need 2 extra bytes / key to store
the key part lengths.
Jules> Now node_ptrs and row_ids fit comfortably into 4 bytes..
Jules> Since there is exactly one node for each row, and each node is at worst
Jules> 12 bytes bigger than the corresponding row, I would expect the btree to
Jules> be around (DB size)+(numrows * 12) = 200M + (9M * 12) = about 300M.
Jules> Now I could imagine some other useful state information you might want
Jules> in the btree, which might push that to 4 or 5 hundred M - but I can't
Jules> get to 2G!
With a key size of 160+16, and a page size of 1024 you may in the
worst case only get in 2-3 keys / page. This will make the .ISM file
a bit bigger (but it shouldn't be 2G in this case)
In MySQL 3.23 (with MyISAM) the key page size may be bigger than 1024
and this will ensure that there is less unused page data for big keys.
Regards,
Monty
| Thread |
|---|
| • Huge index file, why? | Jules Bean | 26 Aug |
| • Huge index file, why? | sinisa | 26 Aug |
| • Re: Huge index file, why? | Jules Bean | 26 Aug |
| • Re: Huge index file, why? | Benjamin Pflugmann | 26 Aug |
| • Re: Huge index file, why? | Michael Widenius | 27 Aug |
| • Re: Huge index file, why? | Michael Widenius | 27 Aug |
| • Re: Huge index file, why? | Jules Bean | 27 Aug |
| • Re: Huge index file, why? | Michael Widenius | 27 Aug |
| • Re: Huge index file, why? | Scott Hess | 27 Aug |
| • Re: Huge index file, why? | Benjamin Pflugmann | 27 Aug |
| • Re: Huge index file, why? | Michael Widenius | 27 Aug |
| • Re: Huge index file, why? | Benjamin Pflugmann | 28 Aug |
| • Re: Huge index file, why? | Jules Bean | 27 Aug |
| • Re: Huge index file, why? | Scott Hess | 27 Aug |
| • Re: Huge index file, why? | Michael Widenius | 29 Aug |
| • Re: Huge index file, why? | V Yau | 30 Aug |
| • Re: Huge index file, why? | Michael Widenius | 30 Aug |