List:Internals« Previous MessageNext Message »
From:Michael Widenius Date:February 12 2008 5:54pm
Subject:re: HEAP tables - what is the hash data structure?
View as plain text  
Hi!

>>>>> "Igor" == Igor Chernyshev <igor_cc75@stripped> writes:

<cut>

Igor> Here is the new explanation of hash index, based on
Igor> source code analysis and Monty's comments -

Igor> Hash index's key definition has a reference to its
Igor> HP_BLOCK that contains index data. HP_BLOCK contains
Igor> array of HASH_INFO structures. Each HASH_INFO contains
Igor> a pointer to the record data and pointer to the next
Igor> HASH_INFO (which can be null). The total number of
Igor> HASH_INFO allocated in HP_BLOCK is equivalent to the
Igor> maximum number of records that was ever inserted into
Igor> the table (basically, HP_BLOCK grows to the current
Igor> number of records, but never shrinks).

The 'active' number of elements in the HP_BLOCK array shrinks on delete;
If we wanted, we could free the not used elements, in practice we
don't. (Yes, I know that you write about this later, but think it's
good to have here too)

Igor> Note that
Igor> HP_BLOCK is a multi-level structure, instances of
Igor> which are used in all memory-intensive parts of Heap
Igor> Table, such as btree index, hash index and record
Igor> data. HP_BLOCK allocates server's memory (malloc) in
Igor> increments (not HASH_INFO-by-HASH_INFO).

Igor> Heap Engine's hash index implements search, write
Igor> (insert) and delete operations. If a key value in hash
Igor> index has to be updated, Heap Engine will first remove
Igor> and then re-insert that key into the hah index.

Igor> Heap Engine enforces that all HASH_INFO are kept at
Igor> the beginning of HP_BLOCK without any holes between
Igor> HASH_INFO structures. By definition, the index
Igor> contains as many valid HASH_INFO elements as there are
Igor> valid (not deleted) records in the table. These valid
Igor> HASH_INFO elements (those that refer records) are all
Igor> kept to the beginning of HP_BLOCK. Insert and delete
Igor> operations ensure that first record-count-worth of
Igor> HASH_INFO elements in HP_BLOCK are valid, and hp_mask
Igor> ensures that we will never look for a HASH_INFO at a
Igor> position equal to or greater than record count.

Igor> The first important function of hash index is hp_mask.
Igor> hp_mask takes hash code and calculates intended
Igor> position of HEAP_INFO in HP_BLOCK. As explained above,
Igor> the index of the HASH_INFO in HP_BLOCK must be less
Igor> than record count. For that purpose, hp_mask will take
Igor> hash code and try to keep as many bits as possible as
Igor> long as the result is still below record count. The
Igor> bitmask is based on power of 2 next to the current
Igor> record count. If the result is record count or above,
Igor> hp_mask will remove the highest bit from the mask,
Igor> thus guaranteeing that result will be below record
Igor> count. Note that hp_mask always requires hash code and
Igor> so the text below will conveniently omit references to
Igor> this implicit and obvious hash code calculation,
Igor> assuming hash code is always implied.

For refrence, it's good to have the function:

ulong hp_mask(ulong hashnr, ulong buffmax, ulong maxlength)
{
  if ((hashnr & (buffmax-1)) < maxlength) return (hashnr & (buffmax-1));
  return (hashnr & ((buffmax >> 1) -1));
}

Here buffmax is guranteed to be next 2^ of max_length (max_length is
number of rows in table)

Igor> When searching for records, hp_search will take key
Igor> data and use hp_mask to calculate expected position of
Igor> the HEAP_INFO. As explained above, hp_mask will always
Igor> return index of some valid HASH_INFO, which points to
Igor> valid record data. If HEAP_INFO points to the record
Igor> that matches key, hp_search will return that record.

We also calculate the hash for the found record and verify that the
hp_mask() for the found hash gives the same row position.  If it
doesn't match, we know that there can be no match for the data and we
can return at once with a 'row-not-found' error.

Igor> If record does not match the key, hp_search will use
Igor> current HASH_INFO's next_key pointer to advance to the
Igor> next HASH_INFO, and then repeat the test until it
Igor> reaches the end of the linked list (next_key is null).

Igor> However, before making the first advancement,
Igor> hp_search will use hp_mask to calculate position of
Igor> the record that it just found. If position of the
Igor> record does not match position of the key, hp_search
Igor> assumes that HASH_INFO was put into that particular
Igor> place only because there was free space, and assumes
Igor> that recently found HASH_INFO is actually referenced
Igor> by some other HASH_INFO, which means there is no need
Igor> to follow next_key link.

Exactly (Same as I said above)

Regards,
Monty
Thread
HEAP tables - what is the hash data structure?Igor Chernyshev11 Feb
  • RE: HEAP tables - what is the hash data structure?Rick James11 Feb
re: HEAP tables - what is the hash data structure?Igor Chernyshev12 Feb
  • re: HEAP tables - what is the hash data structure?Michael Widenius15 Feb