In the last episode (Feb 18), Peter Grigor said:
> I've always thought that adding a count of index entries to the index
> file would be helpful.
> Like (generic index entry): word:15:1,2,5,7,etc...
> where word is the indexed value, 15 is the number of times it
> appears, and 1,2,5,7 is the record offsets (or byte offsets for
> dynamic stuff) of the appearances of the indexed value.
> This way, getting a count of indexed values would be quick, and
> scanning the index would no longer be necessary, because the count
> (15) would tell you how many index entries to skip to get the next
> indexed value in the index.
What you describe above is similar to a bitmap index; they trade off a
more complicated encoding scheme for extremely fast counts. Storing
the record offset is sort of inefficient; that's 4 (8?) bytes per
record. If you either store a bitmap of rows that that value occurs in
(i.e. 1100101 in your example), or delta record/byte offsets
(1,+1,+3,+2,0), and then compress those, you can shrink your indexes to
an incredibly small size. Compressed indexes take a lot longer to
for a description of how I would implement bitmap indexes in MySQL .