Paul Halliday wrote:
> I have 2 tables:
> 1) Event Data
> 2) Mappings
> The query should return something like this:
> Hits IP Country Code
> 20 220.127.116.11 SE
> I am trying this:
> SELECT COUNT(event.src_ip) AS count, INET_NTOA(event.src_ip),
> mappings.cc FROM event, mappings WHERE event.timestamp BETWEEN
> '2009-12-06 20:00' and '2009-12-07 20:00:00' AND event.src_ip BETWEEN
> 'mappings.start_ip' AND 'mappings.end_ip' GROUP BY event.src_ip ORDER
> BY count DESC LIMIT 20;
> Am I supposed to do a join somewhere? Do joins even apply in a where
> clause? or am I totally off the mark.
> Singularly, the queries look like this:
> SELECT cc FROM mappings WHERE INET_ATON('src_ip') BETWEEN start_ip AND end_ip;
> SELECT COUNT(src_ip) AS count, INET_NTOA(src_ip) FROM event WHERE
> timestamp BETWEEN '2009-12-06 20:00' and '2009-12-07 20:00:00' GROUP
> BY src_ip ORDER BY count DESC LIMIT 20;
Technically speaking, you are already doing a JOIN in your query. The
comma operator in the FROM clause combined with the WHERE conditions
make your query logically equivalent to the following rewrite:
SELECT COUNT(event.src_ip) AS count
INNER JOIN mappings
ON event.src_ip BETWEEN 'mappings.start_ip' AND 'mappings.end_ip'
WHERE event.timestamp BETWEEN '2009-12-06 20:00' and '2009-12-07 20:00:00'
GROUP BY event.src_ip
ORDER BY count DESC
However, as we document in the manual, we have demoted the execution
precedence of the comma operator to form what I like to call an
"implicit join" to be evaluated AFTER any explicit JOIN clauses.
What this means is that you may get better performance out of an
explicit join than you do an implicit join.
Also, you may want to consider rewriting your matching condition so that
it can use an index or a combination of indexes on your `start_ip` and
`end_ip` columns (notice my use of backticks ` ` not single quotes ' '
to identify column names) by rewriting your condition as an AND
comparison instead of a BETWEEN comparison
ON event.src_ip >= mappings.start_ip
AND event.src_ip <= mappings.end_ip
The way it is written now: "<col> BETWEEN <colA> and <colB>" is not one
of the cases that we have an optimized and generalized execution plan to
handle well. The separate conditions, like I wrote in my example, is an
optimized situation and has a much better likelihood of using an index
during its evaluation.
I would think that an index on both columns would work better than two
ALTER TABLE event ADD KEY(src_ip, end_ip);
Shawn Green, MySQL Senior Support Engineer
Sun Microsystems, Inc.
Office: Blountville, TN