List:General Discussion« Previous MessageNext Message »
From:Shawn Green Date:December 12 2009 7:49pm
Subject:Re: Join on a where clause.
View as plain text  
Hello Paul,

Paul Halliday wrote:
> I have 2 tables:
> 
> 1) Event Data
> 2) Mappings
> 
> The query should return something like this:
> 
> Hits      IP                      Country Code
> 20        213.136.52.29     SE
> 
> I am trying this:
> 
> SELECT COUNT(event.src_ip) AS count, INET_NTOA(event.src_ip),
> mappings.cc FROM event, mappings WHERE event.timestamp BETWEEN
> '2009-12-06 20:00' and '2009-12-07 20:00:00' AND event.src_ip BETWEEN
> 'mappings.start_ip' AND 'mappings.end_ip' GROUP BY event.src_ip ORDER
> BY count DESC LIMIT 20;
> 
> Am I supposed to do a join somewhere? Do joins even apply in a where
> clause? or am I totally off the mark.
> 
> Singularly, the queries look like this:
> 
> SELECT cc FROM mappings WHERE INET_ATON('src_ip') BETWEEN start_ip AND end_ip;
> 
> SELECT COUNT(src_ip) AS count, INET_NTOA(src_ip) FROM event WHERE
> timestamp BETWEEN '2009-12-06 20:00' and '2009-12-07 20:00:00' GROUP
> BY src_ip ORDER BY count DESC LIMIT 20;
> 

Technically speaking, you are already doing a JOIN in your query. The 
comma operator in the FROM clause combined with the WHERE conditions 
make your query logically equivalent to the following rewrite:

SELECT COUNT(event.src_ip) AS count
, INET_NTOA(event.src_ip)
, mappings.cc
FROM event
INNER JOIN mappings
   ON event.src_ip BETWEEN 'mappings.start_ip' AND 'mappings.end_ip'
WHERE event.timestamp BETWEEN '2009-12-06 20:00' and '2009-12-07 20:00:00'
GROUP BY event.src_ip
ORDER BY count DESC
LIMIT 20;

However, as we document in the manual, we have demoted the execution 
precedence of the comma operator to form what I like to call an 
"implicit join" to be evaluated AFTER any explicit JOIN clauses.
http://dev.mysql.com/doc/refman/5.0/en/join.html

What this means is that you may get better performance out of an 
explicit join than you do an implicit join.

Also, you may want to consider rewriting your matching condition so that 
it can use an index or a combination of indexes on your `start_ip` and 
`end_ip` columns (notice my use of backticks ` ` not single quotes ' ' 
to identify column names) by rewriting your condition as an AND 
comparison instead of a BETWEEN comparison

ON event.src_ip >= mappings.start_ip
   AND event.src_ip <= mappings.end_ip

The way it is written now: "<col> BETWEEN <colA> and <colB>" is not one 
of the cases that we have an optimized and generalized execution plan to 
handle well.  The separate conditions, like I wrote in my example, is an 
optimized situation and has a much better likelihood of using an index 
during its evaluation.

I would think that an index on both columns would work better than two 
single-column indexes.

ALTER TABLE event ADD KEY(src_ip, end_ip);

Warmest regards,
-- 
Shawn Green, MySQL Senior Support Engineer
Sun Microsystems, Inc.
Office: Blountville, TN


Thread
Join on a where clause.Paul Halliday9 Dec
  • RE: Join on a where clause.Neil Aggarwal9 Dec
    • Re: Join on a where clause.Joerg Bruehe9 Dec
      • RE: Join on a where clause.Neil Aggarwal9 Dec
      • Re: Join on a where clause.Paul Halliday10 Dec
        • Re: Join on a where clause.Andy Wallace10 Dec
        • Re: Join on a where clause.Joerg Bruehe12 Dec
  • Re: Join on a where clause.Shawn Green12 Dec