At 5:45a -0400 on 28 May 2007, Yves Goergen wrote:
> On 28.05.2007 09:06 CE(S)T, Kevin Hunter wrote:
>> In particular, I imagine a lot of the HTTP requests would be the
>> same, so you could create a table to store the requested URLs, and
>> then have a second table with the timestamp and foreign key
>> relationship into the first.
> Interesting idea. Inserting would be more work to find the already
> present "dictionary" rows.
My guess is not /that/ much work, since you should only have a known and
relatively small set in this dictionary, it'd basically be cached the
whole time. But, that's my guess. Haven't tried it. Practice and
theory . . .
> Also, URLs sometimes contain things like
> session IDs. They're probably not of interest for my use but it's not
> always easy to detect them for removal.
Really? Why wouldn't it be easy to detect them? You presumably know
what variable you're looking for in the URL string, and applying a
simple regex search-and-replace . . . ?
> IP addresses (IPv4) and especially return codes (which can be mapped to
> a 1-byte value) are probably not worth the reference. Data size values
> should be too distributed for this.
Well, presumably, you'd normalize that part of the table. That is,
rather than include multiple foreign keys in your "data" rows, you'd
create a cartesian product of the the return codes with the dictionary
table. You'd have a slightly more bloated dictionary, but depending on
the number of requests the site(s) get(s), the aggregation would more
than make up for it.
> I could also parse user agent
> strings for easier evaluation, but this takes me the possibility to add
> support for newer browsers at a later time. (Well, I could update the
> database from the original access log files when I've updated the UA
Same thought. If you've only a known set of UA strings, you could
normalize them with the dictionary table as well.
> How large is a row reference? 4 bytes?
I don't know, I'm fairly new to MySQL. I suppose it'd also matter on
the type of index. Anyone more knowledgeable wanna pipe up?
Well. Whatever method works for your needs, cool. I'm going to check
out both MYISAMPACK and ARCHIVE. I was unaware of those. Thanks list!