In the last episode (Nov 19), Benjamin Pflugmann said:
> On Mon 2002-11-18 at 19:01:57 -0500, dkoch@stripped wrote:
> > A BITMAP index will work efficiently in this case, no matter what
> > the distribution of the keys is. The only requirement is that the
> > column be low-cardinality.
> > It basically works like this:
> > true 011011111
> > false 100100000
> > One bitmap is stored for each possible value of the column. In
> > this case the false bitmap shows that the first and fourth records
> > should be return for false = '1'.
> How/where would the row pointer be stored in this scheme? MySQL does
> not know how to find the 'fourth record' without a pointer to the row
> (and this row pointer is what takes most space in this case).
Bitmap indices are neat. The bitmap values are stored in the same
physical order as the records in the database, so for a fixed-length
table format, you don't even need row pointers. For variable-record
formats, I'd expect bitmaps to be stored in 64k-record groups, with a
pointer to the first row at the start of the block, and byte offsets to
the next row (think linked list). This list would then be compressed,
since the resulting values would be nicely clustered around the average
rowlength. The bitmap values themselves are also compressed.
Theoretically, a bitmap index on a column of 1 million "yes" and 50
"no"s could be one disk block in size. This is an extreme example, of
course, and Oracle states that a bitmap index will usually be around
25% the size of the equivalent btree index. I have seen them as small
as 10% on 100-million-record tables, indexing fields with up to 100
> > A table scan would never be necessary, no matter what the
> > distribution of values.
> A table scan is usually faster than jumping wildly around in the
> table file according to an index, as soon as the number of rows to
> read get over some percentage (for MySQL it's about 1/3rd, normally),
> as disk seeks are rather expensive.
> So I do not see what advantage a bitmap index could bring for the
> discussed case, except for some minor size advantage.
Bitmaps are perfect for data-warehouse type tables. They are great for
doing counts, and are one of the few index types where multiple indices
can be used on a single table because they are so small. Since they
can be combined, one would usually build three bitmaps on three
individual fields instead of having to build btree indices on a+b+c,
b+c, and c.
However, for most OLTP-type tables they suck, because it takes a
relatively massive time to update a bitmap index when a single record
changes (unpack the bitmap block covering to that record, update the
bit, repack). On Oracle at least, expect an update query to take 10 to
100 times longer to run if you have bitmaps on changed fields.
For MySQL, I'd lean toward the "don't index these values" btree
optimization, since it's not really geared toward data warehousing.