I am trying to determine the best way to manage very large (MyISAM) tables, ensuring that
they can be queried in reasonable amounts of time. One table in particular has over 18
million records (8GB data) and is growing by more than 150K records per day, and that
rate is increasing. Besides the obvious things like better hardware and load-balancing
across multiple replicating databases, I am trying to determine how to keep these data
sets optimized for fastest queries. In my particular situation, the most recent data is
queried most often, and data over 30-45 days old is not queried much at all but still
must remain accessible. Each record has an integer time column that is indexed for
querying over periods of time. Currently I run a script regularly that moves records
older than 45 days from tables in the main database into identical tables in another
(archive) database running in the same server process. This seems to speed up the tables
in the main database, but I realize that deleting records leaves holes in the tables, and
that this slows inserts as well as makes it impossible to read and write concurrently
from these tables. My question is, is it better to keep all of the data in the original
tables to avoid holes, or is 'archiving' records to another database a wise approach?
How much does the size of a table really affect performance when querying the more recent
data? If archiving is reasonable, is there a way to optimize the tables to get rid of the
holes without siginificantly impacting ongoing activity on these tables?
Thanks for your time!