List:General Discussion« Previous MessageNext Message »
From:SGreen Date:April 15 2005 1:54pm
Subject:Re: LEFT JOINS same data twice?
View as plain text  
"Chris Knipe" <savage@stripped> wrote on 04/15/2005 04:36:17 AM:

> >> SELECT tlb1.DepartureLocation AS Departure, tbl1.DepartureLocation AS 

> >> Destination LEFT JOIN DepartureLocation ON 
> >> tlb2.DepartureID=tlb1.LocationID LEFT JOIN DepartureLocation ON 
> >> tbl2.ArrivalID=tbl1.LocationID ORDER BY tbl2.Time etc etc etc
> >>
> >>
> >> I get
> >>
> >> ERROR 1066 (42000): Not unique table/alias: 'tbl1'
> >
> > The statement above looks a bit strange, there is no FROM clause, and 
> > there seems to be 4 tables involved, named tbl1, tlb1, tbl2 and tlb2? 
> > According to your table description above, there is no column named 
> > "DepartureLocation"? According to the query, it seems like a table has 

> > that name?
> 
> Sorry, was trying to keep things simple and made a couple of typos (yes, 

> silly of me, I know).
> 
> 
> 
> > I'll pretend you have two tables named tbl1 and tbl2, containing the 
> > columns you described above:
> >
> > tbl1: LocationID, Location
> > tbl2: DepartureID, ArrivalID, Time
> >
> > Then try something like this:
> >
> > SELECT Departure.Location, Arrival.Location AS Destination, Time
> >   FROM tbl2
> >   LEFT JOIN tbl1 AS Departure ON
> >     Departure.LocationID = DepartureID
> >   LEFT JOIN tbl1 AS Arrival ON
> >     Arrival.LocationID = ArrivalID
> >   ORDER BY Time
> >
> > Note that there are two different uses of alias in this statementtable 

> > alias and column alias: the table "tbl1" is aliased twice, to 
"Departure" 
> > and "Arrival". Aliasing a table is necessary to be able to join the 
same 
> > table multiple times. The "Arrival.Location" column is aliased to 
> > "Destination". This is necessary to avoid two columns from having the 
same 
> > name, in this case both columns would have been named "Location".
> 
> YAY :) Ok, that does explain allot, and it does work yes.  Thank you 
VERY 
> much.
> 
> mysql> SELECT FlightData.PilotCallSign AS FLNum,
>     ->        FlightLogs.Tracker AS Tracker,
>     ->        FlightLogs.TimeStart AS TimeDep,
>     ->        Departure.IATA AS AirpDep,
>     ->        Arrival.IATA AS AirpDes,
>     ->        FlightLogs.Plane AS Plane
>     ->   FROM FlightLogs
>     ->   LEFT JOIN Airports AS Departure ON 
> FlightLogs.AirpDep=Departure.ICAO
>     ->   LEFT JOIN Airports AS Arrival ON 
FlightLogs.AirpDes=Arrival.ICAO
>     ->   LEFT JOIN FlightData ON FlightLogs.Tracker=FlightData.Tracker
>     ->  GROUP BY FlightData.Tracker
>     ->  ORDER BY FlightLogs.TimeStart
>     ->  LIMIT 30;
> +---------+--------------------------------------
> +---------------------+---------+---------+----------+
> | FLNum   | Tracker                              | TimeDep             | 

> AirpDep | AirpDes | Plane    |
> +---------+--------------------------------------
> +---------------------+---------+---------+----------+
> <SNIP>
> +---------+--------------------------------------
> +---------------------+---------+---------+----------+
> 30 rows in set (15.52 sec)
> 
> My next question now, would be where to INDEX.  Size is a MAJOR issue 
here. 
> Two days worth of data, and I am already sitting with a DB over 60MB. 
The 
> above query is simply for debug / testing purposes.  There will be a 
WHERE 
> clause, i.e. WHERE FlightLogs.CCID='blah'  I will more than likely also 
only 
> query BETWEEN certain dates, based on the FlightLogs.StartTime - thus, 
also 
> no LIMIT.
> 
> Airports have INDEX on Name, CountryID, ICAO as well as IATA.
> FlightData have INDEX on PilotCID, PillotCallSign, and Tracker.
> FlightLogs have INDEX on Tracker(Unique), CCID, AirpDes, AirpDep, and 
> TimeStart
> 
> Thus, all columns used for the JOINs, have indexes already?  All GROUP 
BYs 
> have indexes already?  Why is this taking so long?????  FlightData has 
just 
> over 130.5K records, Airports just over 23K records.  Airports won't 
grow 
> much anymore, FlightData will grow to a ENOURMOUSE size, FlightLogs will 
be 
> exponentially smaller than FlightData - but also quite big over time...
> 
> Hope you can be of assistance again :)
> 
> --
> Chris
> 
> 

I think you are ready for the multi-step query. This is a form of 
divide-and-conquer so that your queries need to join smaller sets of data 
(not full tables) so that make the engine do less work. First step, limit 
how many records from FlightLogs we need to JOIN against. By eliminating 
80% or more of the rows that come from FlightLogs, we make the joins with 
FlightData and the other tables just that much faster:

CREATE TEMPORARY TABLE tmpLogs
SELECT Tracker,
       TimeStart,
       AirpDep,
       AirpDes,
       Plane
FROM FlightLogs
WHERE CCID='blah'
        AND TimeStart BETWEEN '2005-01-01' and '2005-01-02';


SELECT DISTINCT FlightData.PilotCallSign AS FLNum 
         tmpLogs.Tracker AS Tracker,
       tmpLogs.TimeStart AS TimeDep,
       Departure.IATA AS AirpDep,
       Arrival.IATA AS AirpDes,
       tmpLogs.Plane AS Plane
FROM tmpLogs
LEFT JOIN Airports AS Departure 
        ON tmpLogs.AirpDep=Departure.ICAO
LEFT JOIN Airports AS Arrival 
        ON tmpLogs.AirpDes=Arrival.ICAO
LEFT JOIN FlightData 
        ON tmpLogs.Tracker=FlightData.Tracker
ORDER BY tmpLogs.TimeStart;

DROP TEMPORARY TABLE tmpFlights;


I eliminated your GROUP BY clause because you weren't actually GROUPing 
anything. You weren't looking for a MIN(), a MAX(), an AVG(), or anything 
else that GROUP BY was intended to be used for. That method of 
de-duplicating results will return potentially misleading information as 
it will randomly pick just one record from all of the records that fit 
within that "group". This way, by using DISTINCT, you can see if things 
change mid-flight or not (which is either an indication of bad data or 
some other troubles).

We could possibly speed up the second JOIN by adding a few indexes to our 
temp table.

ALTER TABLE tmpLogs ADD KEY(Tracker), ADD KEY(AirpDep), ADD KEY(AirpDest);

This command would fit between the CREATE...SELECT... and the second 
SELECT... Even though the docs say that only 1 index is ever used per 
query, I believe that each JOIN can also make use of an index if it exists 
(this belief is based on the results of the EXPLAIN(s) of queries 
containing JOINed tables. However, I could be completely wrong in this 
belief.).

I do not think this technique is faster if you eliminate the WHERE clause 
from the first query. But if you do have a WHERE, it should return much 
faster than if you simply added a WHERE to your original query.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Thread
LEFT JOINS same data twice?Chris Knipe13 Apr
  • Re: LEFT JOINS same data twice?Peter Brawley13 Apr
  • Re: LEFT JOINS same data twice?Rhino13 Apr
  • Re: LEFT JOINS same data twice?Roger Baklund13 Apr
  • Re: LEFT JOINS same data twice?Chris Knipe15 Apr
    • Re: LEFT JOINS same data twice?SGreen15 Apr
      • Re: LEFT JOINS same data twice?Peter Brawley15 Apr
      • Re: LEFT JOINS same data twice?Chris Knipe15 Apr
        • Re: LEFT JOINS same data twice?SGreen15 Apr
          • Re: LEFT JOINS same data twice?SGreen15 Apr