List:General Discussion« Previous MessageNext Message »
From:Dan Nelson Date:February 18 2003 4:12pm
Subject:Re: Slow retrieval of distinct on indexed fields
View as plain text  
In the last episode (Feb 18), Peter Grigor said:
> I've always thought that adding a count of index entries to the index
> file would be helpful.
> 
> Like (generic index entry): word:15:1,2,5,7,etc...
> 
> where word is the indexed value, 15 is the number of times it
> appears, and 1,2,5,7 is the record offsets (or byte offsets for
> dynamic stuff) of the appearances of the indexed value.
> 
> This way, getting a count of indexed values would be quick, and
> scanning the index would no longer be necessary, because the count
> (15) would tell you how many index entries to skip to get the next
> indexed value in the index.

What you describe above is similar to a bitmap index; they trade off a
more complicated encoding scheme for extremely fast counts.  Storing
the record offset is sort of inefficient; that's 4 (8?) bytes per
record.  If you either store a bitmap of rows that that value occurs in
(i.e. 1100101 in your example), or delta record/byte offsets
(1,+1,+3,+2,0), and then compress those, you can shrink your indexes to
an incredibly small size.  Compressed indexes take a lot longer to
update, though.

See http://lists.mysql.com/cgi-ez/ezmlm-cgi?1:mss:125271:200211:bmokiphdlmibimfljjkf
for a description of how I would implement bitmap indexes in MySQL .

-- 
	Dan Nelson
	dnelson@stripped
Thread
Slow retrieval of distinct on indexed fieldsRenars Jeromans18 Feb
  • Re: Slow retrieval of distinct on indexed fieldsDan Nelson18 Feb
    • Re: Slow retrieval of distinct on indexed fields [OT?]Michael T. Babcock18 Feb
  • Re: Slow retrieval of distinct on indexed fieldsPeter Grigor18 Feb
    • Re: Slow retrieval of distinct on indexed fieldsDan Nelson18 Feb
  • Re: Slow retrieval of distinct on indexed fieldsPeter Grigor18 Feb