--- Paul Halliday <paul.halliday@stripped> wrote:
> On 4/23/06, John Hicks <johnlist@stripped> wrote:
> > Paul Halliday wrote:
> > > I am doing queries that produce a table that looks something like
> > >
> > > Count | IP Address | First Seen | Last Seen | Days
> > > 5000 10.0.0.1 2005-12-10 2006-04-15 50*
> > > 6500 10.0.0.2 2006-04-01 2006-04-06 3**
> > >
> > > *So of the 5000 events "count(*) between 2005-12-10 and
> > > the count was distributed within 50 distinct days.
> > >
> > > **This time the count is still high and it occured within 3 days
> > > between 2006-04-01 and 2006-04-06.
> > >
> > > I just cant figure out how to come up with the days part :).
> > Well, I guess that is a *bit* more specific, but not much better :)
> > So I'm guessing that the source of this data is perhaps a web
> access log
> > and that you are tracking IP addresses of visitors.
> > Can we tickle a little more information out of you?
> Ok. The data is IDS events. I am not trying to create any new
> information I just want to extract information. This information will
> be used to relay whether a particular machine has ongoing issues.
> For example,
> SELECT COUNT(src_ip) AS CNT, INET_NTOA(src_ip), MAX(timestamp) FROM
> event WHERE timestamp > '2006-04-24 03:00:00' AND sid="1" AND
> signature_id<>"1" GROUP BY src_ip ORDER BY cnt DESC LIMIT 10
> This will give me the top 10 source addresses for today based on how
> many events they have triggered.
> If they make the top ten, I want to see when we first saw that
> SELECT MIN(timestamp) FROM event WHERE src_ip=INET_ATON('<loop
> top ten>')
> I now want to know; out of all of the days between first seen and
> seen which days had events on them. I dont want the count(events) for
> eah day, just whether there was an event or not so that I know
> first seen and last seen what the rate of appearance was.
> I could do something crufty like this (the row count would be the
> answer I am looking for):
> SELECT DATE_FORMAT(timestamp,'%Y-%m-%d %a') AS DAY, COUNT(*) FROM
> event WHERE timestamp BETWEEN '2006-01-01' AND '2006-04-24' and
> src_ip=INET_NTOA('10.0.0.1') OR dst_ip=INET_ATON('10.0.0.1') GROUP BY
> But that seems like a lot of extra processing.
> Thanks and sorry for the confusion.
> > Ideally, it would be nice to know what task you are trying to
> > What is the source of your data?
> > What is the "condition" you are testing for?
> > And what, very specifically, is it that you would like us to help
> you with.
> > --John
I think what you are looking for is the DISTINCT modifier to the
COUNT() aggregate function. That way you count only how many different
values exist in the list, not how many items are in the list.
Count | IP Address | First Seen | Last Seen | Days
COUNT(src_ip) AS CNT
, MIN(timestamp) first
, MAX(timestamp) last
, COUNT(DISTINCT DATE(timestamp)) days
WHERE timestamp > '2006-04-24 03:00:00'
GROUP BY src_ip
ORDER BY cnt DESC
That is close to what you want. There are several ways to convert
timetsamps values into something that can be counted as a unique day
but I think the DATE function will be fast enough.
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around