>>>>> "Mike" == Mike Schwartz <mykes@stripped> writes:
Mike> I think I know quite well how indexes work. I am suggesting a modification to
Mike> MySQL that would be beneficial is what's called an index scan or full index
Mike> scan. Rather than using the index for hashed or b-tree fast lookups of single
Mike> key or index values, scan the whole thing.
Mike> The 250K record database consists of 250K documents. There are about 50
Mike> fields of information about the document, and a long variable to hold the
Mike> actual body of the document.
Mike> I created an index that contained the most common of the 50 fields queries acted
Mike> upon. This index fits entirely in memory (250K x 1K/index would fit in 256M of
Mike> RAM!). I would hope that MySQL would be smart enough to use available memory
Mike> to cache entire indexes that easily fit in the machine's RAM.
Mike> A full index scan in memory, doing the regular pattern matches is clearly way
Mike> faster than the 1G+ a full database scan on disk would be.
Mike> For what it's worth, the .ISD file for this database is 791M, the ISM file is
Mike> 135M, and the .frm file is 9K. That 135M should easily fit in RAM on my 256M
Mike> machine. The production servers have 512M... Even if it didn't fit, scanning
Mike> the 135M file on disk would be (logically) almost 8x faster.
Mike> I am not using the context option of oracle. The initora table was set up to
Mike> assure the indexes fit in RAM all the time.
Mike> In case you still need answers to your last set of questions:
Mike> oracle running on a 333MHz celeron (for development), and on a dual PII 450MHz
Mike> system (for production). MySQL tested on the same WWW site logic as with the
Mike> oracle version of the site, with mostly minor tweaks for mysql (table create
Mike> needed massive changes, the queries all seem to work fine).
MYSQL caches all index request in the key_buffer and will only scan
the index if you only refer to columns in it.
One major problem with scanning memory is that it will trash the
CPU memory pipleline completely (which can make programs 20 times
slower than if you can keep most things in the CPU cache)
Another problem is that MySQL users a table handler is a general
interface to tables to be able to work with different table engines.
This adds some layers on how fast one can process data as there is a
lot of function calls involved to transfer information from the index
layer to the MySQL WHERE function handling.
This is not that slow, but we would like to have it MUCH faster; We
plan to allow MySQL to send a function to the table level, so that one
can scan a table in a VERY small loop. This simple addition can make
table scanning 3-10 times faster...
Another problem we found out recently is that if you have many threads
that looks up things in the key cache at the same time, there will be
a very bad mutex contamination in it. If anyone has any nice ideas to
fix this, please mail me! (I have a couple of ideas involving
read/write mutex, but I am still searching after a simple, elegant