"Martijn Tonies" <m.tonies@stripped> wrote on 01/19/2005 03:33:32 AM:
> Hello Jeremy,
> > > I have a curious issue here, maybe someone can help.
> > >
> > > I have a single process that inserts data into tables that contain
> > > purely logging information. This table is then searched by our Care
> > > department to troubleshoot issues. I am looking for the best way to
> > > store this data, and the structure on the backend.
> > >
> > > There are 50 million inserts into table LOG a day. The primary
> > > on the table is seconds from 1971. I only need to keep 60 days
> > > of data, and the table is only used for read purposes. This is my
> > > design criteria, but my problem is how to delete old data without
> > > crashing the log writer that is atteched to the table.
> > OK, how about this:
> > Use MyISAM and MERGE tables. Keep one table per day. E.g.:
> > log_2005_01_15
> > log_2005_01_16
> > log_2005_01_17
> > log_2005_01_18
> > etc.
> > Use MERGE tables to give you the 60 day (and perhaps e.g. 14 day, 30
> > day, 7 day, etc.) read views that you need, like so:
> > CREATE TABLE log_view_7day (
> > ...
> > ) TYPE=MERGE UNION=(
> > log_2005_01_12,
> > You can then do all of your inserts from the log writer into the
> > table, and do your reads against the various MERGEs.
> > Every day at exactly midnight, you would use ALTER TABLE (which is
> > atomic) to redefine the UNION of the MERGE definition of the various
> Modifying metadata because you need a different view at your
> Am I the only one to which this sound ugly?
> With regards,
> Martijn Tonies
> Database Workbench - developer tool for InterBase, Firebird, MySQL & MS
> Upscene Productions
It sounds just fine to me. IMHO this is an excellent usage of MySQL
features to solve a technical issue. The original poster wants to maintain
only 60 days worth of data. He also needs really fast inserts. MyISAM can
perform concurrent, non-blocking inserts so long as there are no deletion
gaps in the table to which it is inserting. One solution to this is to
create what is in effect a "view" that spans 60 days worth of data, each
day's data in its own table. However, since views wont be officially
available until later, a MERGE table is an effective substitute.
This gives him many kinds of granularity options when performing queries,
too. He can query the daily table directly if he knows that the query does
not span days. He could use multiple MERGE tables to aggregate various
ranges of daily data. He could UNION several single-day queries together
to cover a span or list of dates. All of these have various advantages.
The biggest of which is that when it comes time to move the out of date
data off-line, he won't have to lock the current day's data to do it.
One alternative could be a VIEW built against a single, larger table with
a query specifying a date range of 60 days. Something like:
CREATE VIEW umptyfratz AS SELECT * FROM datatable WHERE logdate > (NOW()-
INTERVAL 60 days)
However this introduces several performance hits:
1) the WHERE Clause has to be recomputed every time the VIEW is
called so that you return only the data within the specified date range.
This may not scale well to larger tables.
2) If we delete rows from "logdate" older than 60 days, we create
deletion gaps in the data. This prevents MyISAM from simply appending data
to the end of the table (non-blocking inserts) and forces a table lock for
each insert so that the deletion gaps can be filled in first. This limits
However, there is a dark side to the MERGE table solution, too. As is
explained in the manual (
http://dev.mysql.com/doc/mysql/en/MERGE_storage_engine.html), the merge
table AND each table participating in the merge will consume an operating
system file handle each time the merge table is opened. For a merge table
covering 60 daily tables, that means that the OS has to issue 61 new file
handles PER USER ACCESSING THE TABLE. Depending on the limits of your OS,
you may not be able to open very many files. I would recommend using
weekly tables so that a MERGE table covering 60 days worth of data would
only need 10 file handles from the OS (9 weeks worth of data + 1 for the
Of course, it is imperative that before putting such a solution into
production that any DBA must test, test, and retest to make sure they are
getting optimal performance and resource usage.
Unimin Corporation - Spruce Pine