List:Internals« Previous MessageNext Message »
From:Igor Chernyshev Date:July 31 2008 7:36pm
Subject:Re: Dynamic record sizes for HEAP engine
View as plain text  
Update for those interested in this patch --

Larry Zhou from Google and I have made a few bug fixes.
The diff and zip files have been updated.
Original diff can be found in "Deprecated" downloads.
It's still based on MySQL 5.0.45.

http://code.google.com/p/mysql-heap-dynamic-rows

Thanks,
Igor

--- On Thu, 4/17/08, Igor Chernyshev <igor_cc75@stripped> wrote:

> From: Igor Chernyshev <igor_cc75@stripped>
> Subject: Re: Dynamic record sizes for HEAP engine
> To: "Igor Chernyshev" <igor_cc75@stripped>, "Sergei Golubchik"
> <serg@stripped>
> Cc: internals@stripped
> Date: Thursday, April 17, 2008, 3:55 PM
> Forgot to mention in my previous post - as indicated
> on http://code.google.com/p/mysql-heap-dynamic-rows ,
> this patch has been contributed by eBay, Inc via GPL
> v2. (and, yes, I wrote the code : ))
> 
> Thanks,
> Igor
> 
> --- Igor Chernyshev <igor_cc75@stripped> wrote:
> 
> > I've uploaded the patch to code.google.com. It is
> > based on 5.0.45. See project description, as well as
> > DesignDetails, Usage and PatchFormat Wiki pages for
> > more information.
> > 
> > Note that BLOB support should be now easy to add.
> > This
> > patch provides new HP_DATASPACE structure, which
> > could
> > be instantiated for table's BLOB area. As another
> > option, BLOB data could be embedded into the records
> > themselves.
> > 
> > http://code.google.com/p/mysql-heap-dynamic-rows
> > 
> > Below is a copy of design notes from dspace.c (same
> > as
> > DesignDetails Wiki).
> > 
> > Thanks,
> > Igor
> > 
> > ================
> >   MySQL Heap tables keep data in arrays of
> > fixed-size
> > chunks.
> >   These chunks are organized into two groups of
> > HP_BLOCK structures:
> >     - group1 contains indexes, with one HP_BLOCK per
> > key
> >       (part of HP_KEYDEF)
> >     - group2 contains record data, with single
> > HP_BLOCK
> >       for all records, referenced by
> > HP_SHARE.recordspace.block
> > 
> >   While columns used in index are usually small,
> > other
> > columns
> >   in the table may need to accomodate larger data.
> > Typically,
> >   larger data is placed into VARCHAR or BLOB
> > columns.
> > With actual
> >   sizes varying, Heap Engine has to support
> > variable-sized records
> >   in memory. Heap Engine implements the concept of
> > dataspace
> >   (HP_DATASPACE), which incorporates HP_BLOCK for
> > the
> > record data,
> >   and adds more information for managing
> > variable-sized records.
> > 
> >   Variable-size records are stored in multiple
> > "chunks",
> >   which means that a single record of data (database
> > "row") can
> >   consist of multiple chunks organized into one
> > "set".
> > HP_BLOCK
> >   contains chunks. In variable-size format, one
> > record
> >   is represented as one or many chunks, depending on
> > the actual
> >   data, while in fixed-size mode, one record is
> > always
> > represented
> >   as one chunk. The index structures would always
> > point to the first
> >   chunk in the chunkset.
> > 
> >   At the time of table creation, Heap Engine
> > attempts
> > to find out
> >   if variable-size records are desired. A user can
> > request
> >   variable-size records by providing either
> > row_type=dynamic or
> >   block_size=NNN table create option. Heap Engine
> > will
> > check
> >   whether block_size provides enough space in the
> > first chunk
> >   to keep all null bits and columns that are used in
> > indexes.
> >   If block_size is too small, table creation will be
> > aborted
> >   with an error. Heap Engine will revert to
> > fixed-size
> > allocation
> >   mode if block_size provides no memory benefits (no
> > VARCHAR
> >   fields extending past first chunk).
> > 
> >   In order to improve index search performance, Heap
> > Engine needs
> >   to keep all null flags and all columns used as
> > keys
> > inside
> >   the first chunk of a chunkset. In particular, this
> > means that
> >   all columns used as keys should be defined first
> > in
> > the table
> >   creation SQL. The length of data used by null bits
> > and key columns
> >   is stored as fixed_data_length inside HP_SHARE.
> > fixed_data_length
> >   will extend past last key column if more
> > fixed-length fields can
> >   fit into the first chunk.
> > 
> >   Variable-size records are necessary only in the
> > presence
> >   of variable-size columns. Heap Engine will be
> > looking for VARCHAR
> >   columns, which declare length of 32 or more. If no
> > such columns
> >   are found, table will be switched to fixed-size
> > format. You should
> >   always try to put such columns at the end of the
> > table definition.
> > 
> >   Whenever data is being inserted or updated in the
> > table
> >   Heap Engine will calculate how many chunks are
> > necessary.
> >   For insert operations, Heap Engine allocates new
> > chunkset in
> >   the recordspace. For update operations it will
> > modify length of
> >   the existing chunkset, unlinking unnecessary
> > chunks
> > at the end,
> >   or allocating and adding more if larger length is
> > necessary.
> > 
> >   When writing data to chunks or copying data back
> > to
> > record,
> >   Heap Engine will first copy fixed_data_length of
> > data using single
> >   memcpy call. The rest of the columns are processed
> > one-by-one.
> >   Non-VARCHAR columns are copied in their full
> > format.
> > VARCHAR's
> >   are copied based on their actual length. Any NULL
> > values after
> >   fixed_data_length are skipped.
> > 
> >   The allocation and contents of the actual chunks
> > varies between
> >   fixed and variable-size modes. Total chunk length
> > is
> > always
> >   aligned to the next sizeof(byte*). Here is the
> > format of
> >   fixed-size chunk:
> >       byte[] - sizeof=chunk_dataspace_length, but at
> > least
> >                sizeof(byte*) bytes. Keeps actual
> > data
> > or pointer
> >                to the next deleted chunk.
> >                chunk_dataspace_length equals to full
> > record length
> >       byte   - status field (1 means "in
> use", 0
> > means
> > "deleted")
> >   Variable-size uses different format:
> >       byte[] - sizeof=chunk_dataspace_length, but at
> > least
> >                sizeof(byte*) bytes. Keeps actual
> > data
> > or pointer
> >                to the next deleted chunk.
> >                chunk_dataspace_length is set
> > according
> > to table
> >                setup (block_size)
> >       byte*  - pointer to the next chunk in this
> > chunkset,
> >                or NULL for the last chunk
> >       byte  -  status field (1 means
> "first", 0
> > means
> > "deleted",
> >                2 means "linked")
> > 
> >   When allocating a new chunkset of N chunks, Heap
> > Engine will try
> >   to allocate chunks one-by-one, linking them as
> > they
> > become
> >   allocated. Allocation of a single chunk will
> > attempt
> > to reuse
> >   a deleted (freed) chunk. If no free chunks are
> > available,
> >   it will attempt to allocate a new area inside
> > 
> === message truncated ===
> 
> 
> 
>      
> ____________________________________________________________________________________
> Be a better friend, newshound, and 
> know-it-all with Yahoo! Mobile.  Try it now. 
> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ


      
Thread
Re: Dynamic record sizes for HEAP engineIgor Chernyshev31 Jul
  • Re: Dynamic record sizes for HEAP engineMichael Widenius2 Aug
    • Re: Dynamic record sizes for HEAP engineHarrison Fisk5 Aug
      • Re: Dynamic record sizes for HEAP engineMichael Widenius6 Aug