I grant you that, however my point was about the total count of the indexed
word's appearance, which would still stay the same, regardless of the
encoding of the position information.
Hoobly Free Classifieds
----- Original Message -----
From: "Dan Nelson" <dnelson@stripped>
To: "Peter Grigor" <pgrigor@stripped>
Cc: "Renars Jeromans" <renars.jeromans@stripped>; <mysql@stripped>
Sent: Tuesday, February 18, 2003 11:12 AM
Subject: Re: Slow retrieval of distinct on indexed fields
> In the last episode (Feb 18), Peter Grigor said:
> > I've always thought that adding a count of index entries to the index
> > file would be helpful.
> > Like (generic index entry): word:15:1,2,5,7,etc...
> > where word is the indexed value, 15 is the number of times it
> > appears, and 1,2,5,7 is the record offsets (or byte offsets for
> > dynamic stuff) of the appearances of the indexed value.
> > This way, getting a count of indexed values would be quick, and
> > scanning the index would no longer be necessary, because the count
> > (15) would tell you how many index entries to skip to get the next
> > indexed value in the index.
> What you describe above is similar to a bitmap index; they trade off a
> more complicated encoding scheme for extremely fast counts. Storing
> the record offset is sort of inefficient; that's 4 (8?) bytes per
> record. If you either store a bitmap of rows that that value occurs in
> (i.e. 1100101 in your example), or delta record/byte offsets
> (1,+1,+3,+2,0), and then compress those, you can shrink your indexes to
> an incredibly small size. Compressed indexes take a lot longer to
> update, though.
> for a description of how I would implement bitmap indexes in MySQL .
> Dan Nelson