At 12:31a -0400 on 28 May 2007, Dan Nelson wrote:
> In the last episode (May 27), Yves Goergen said:
>> I'm thinking about using a MySQL table to store an Apache access log
>> and do statistics on it. Currently all access log files are stored as
>> files and compressed by day. Older log files are compressed by month,
>> with bzip2. This gives a very good compression ratio, since there's a
>> lot of repetition in those files. If I store all that in a regular
>> table, it would be several gigabytes large. So I'm looking for a way
>> to compress the database table but still be able to append new rows.
>> As the nature of a log file, it is not required to alter previous
>> data. It could only be useful to delete older rows. Do you know
>> something for that?
> You want the ARCHIVE storage engine.
Huh. This is the first I've heard of the archive engine. Cool!
However, I'm curious how the compression offered by OPTIMIZE TABLE and
the zlib library would compare to denormalization of the log schema. In
particular, I imagine a lot of the HTTP requests would be the same, so
you could create a table to store the requested URLs, and then have a
second table with the timestamp and foreign key relationship into the
first. Depending on how wide the original rows are and how often
they're requested, I imagine you could get quite a savings. Anything
else that's repeated as well? IP's? Return codes?
Would be curious about the results if you were able to implement both.