From: Dan Nelson Date: February 18 2003 4:12pm Subject: Re: Slow retrieval of distinct on indexed fields List-Archive: http://lists.mysql.com/mysql/132709 Message-Id: <20030218161241.GB90978@dan.emsphone.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In the last episode (Feb 18), Peter Grigor said: > I've always thought that adding a count of index entries to the index > file would be helpful. > > Like (generic index entry): word:15:1,2,5,7,etc... > > where word is the indexed value, 15 is the number of times it > appears, and 1,2,5,7 is the record offsets (or byte offsets for > dynamic stuff) of the appearances of the indexed value. > > This way, getting a count of indexed values would be quick, and > scanning the index would no longer be necessary, because the count > (15) would tell you how many index entries to skip to get the next > indexed value in the index. What you describe above is similar to a bitmap index; they trade off a more complicated encoding scheme for extremely fast counts. Storing the record offset is sort of inefficient; that's 4 (8?) bytes per record. If you either store a bitmap of rows that that value occurs in (i.e. 1100101 in your example), or delta record/byte offsets (1,+1,+3,+2,0), and then compress those, you can shrink your indexes to an incredibly small size. Compressed indexes take a lot longer to update, though. See http://lists.mysql.com/cgi-ez/ezmlm-cgi?1:mss:125271:200211:bmokiphdlmibimfljjkf for a description of how I would implement bitmap indexes in MySQL . -- Dan Nelson dnelson@stripped