At 5:32a -0400 on Fri, 16 May 2008, Ananda Kumar wrote:
>>> 2 Index would do range scan, which would be comparitivly slower.
>> Not exactly. This, again, depends on implementation and the
>> data against which queries are run. An index is an index.
>> Remember, a primary key is implemented by the MySQL engine as
>> a unique not null b-tree index. In fact, trying to get
>> low-cardinality data from a table via indexes could even be
>> faster than the primary key. Consider a table with 1,000,001
>> rows of favorite colors of people. Let's say 1 million
>> people like the color blue, but only 1 person who likes the
>> color puce. An index for that single person would be a huge
>> win. Depending on the DB (read: implementation) one could make
>> this particular style of skewed data even faster. Postgres,
>> for example, has partial-indexes.
> As stated by you as above. Here puce is a clear case of just one
> record being read, which is very much similar to reading a record
> with a PRIMARY KEY, which is supposed to be fast for HIGH
> CARDINALITY, if there were 1/4 million people who liked puce, then
> the index would a RANGE SCAN and would be slow.
Eh, again, it's hard to say. In my contrived example, puce is a clear
case of an index win via the fact that it's unique, but it's *not* like
the primary key because every other key isn't unique. A primary key
enforces high-cardinality, so selecting individual rows is ostensibly
fast, but now lets select a range of rows. What if we made ten people
like the color red and SELECTed them? How about 2,000 green? 100,000
yellow? An index perusal would still weed out >= 900,000 rows, so is it
a win? Try it an find out. I don't know the answer.
Getting data from the index isn't free. It /can/ be a win when it helps
remove a statistically strong portion of the data, dependent on disk
speed, cpu speed, and a data cardinality. Sometimes, depending on the
what rows and data it takes to satisfy a query, it's actually better to
ignore the indexes and do a range scan of the table.
The point is that it isn't as simple as snap-your-fingers,-you're-done,
which I gather we both understand. I suggest again to the OP to do some
tests on /personal data sets/ and see what answers needs.
Besides, who likes puce anyway? ;-)