List:General Discussion« Previous MessageNext Message »
From:Chris Knipe Date:April 15 2005 5:20pm
Subject:Re: LEFT JOINS same data twice?
View as plain text  
> I eliminated your GROUP BY clause because you weren't actually GROUPing
> anything. You weren't looking for a MIN(), a MAX(), an AVG(), or anything
> else that GROUP BY was intended to be used for. That method of
> de-duplicating results will return potentially misleading information as
> it will randomly pick just one record from all of the records that fit
> within that "group". This way, by using DISTINCT, you can see if things
> change mid-flight or not (which is either an indication of bad data or
> some other troubles).

Ok, again I should have explained this.  The reason why FlightData is so 
huge and growing so rapidly, is because we take a snapshot every minute of 
every single flight we monitor.  I therefore group by a time stamp and/or 
FlightTracker (Depending what data I need), so that I can only get the data 
returned each time the data (plane's position) actually changed.  A quick 
example:

mysql> SELECT * FROM FlightData LIMIT 2\G
*************************** 1. row ***************************
      EntryID: 3471
      Network: V
    TimeStamp: 2005-04-14 07:27:42
      Tracker: df0363ea-fe06-1027-b1dd-00e01884a424
PilotCallSign: xxx
     PilotCID: xxx
PilotRealName: Abdul Henderson KDFW
   CurHeading: 124
       CurAlt: 34887
        CurGS: 469
        Plane: H/DC10/W
  Transponder: 7207
        QNHHg: 29.80
        QNHMb: 1009.14
      Enroute: 06:00:00
         Feul: 10:00:00
   AirportDep: KLAX
   AirportDes: MMMX
   AirportAlt:
  PositionCur: 32.06073/-114.70269
  PositionDep: 33.942536/-118.408075
  PositionDes: 19.435278/-099.072222
         PLGS: 480
        PLAlt: 35000
    PLDepTime: 640
 PLFlightType: I
*************************** 2. row ***************************
      EntryID: 3472
      Network: V
    TimeStamp: 2005-04-14 07:37:42
      Tracker: df0363ea-fe06-1027-b1dd-00e01884a424
PilotCallSign: xxx
     PilotCID: xxx
PilotRealName: xxx
   CurHeading: 255
       CurAlt: 35707
        CurGS: 470
        Plane: B738
  Transponder: 7207
        QNHHg: 29.96
        QNHMb: 1014.56
      Enroute: 02:12:00
         Feul: 05:30:00
   AirportDep: KPDX
   AirportDes: KLAX
   AirportAlt:
  PositionCur: 33.94045/-118.40567
  PositionDep: 45.588722/-122.597500
  PositionDes: 33.942536/-118.408075
         PLGS: 350
        PLAlt: 34000
    PLDepTime: 220
 PLFlightType: I
2 rows in set (0.00 sec)

FlightTracker therefore appears in the database a couple of hundred times 
for each flight.  It is inserted each and every time we poll the flight, 
whether the data has changed or not.  If I do not group therefore, my 
original query would return the Deparutre / Arrival for each and every 
occurance for FlightTracker.  I do however understand what you are saying, 
and I was not aware that it would Group the GROUP / JOIN the entire table 
before adding the exceptions.  This is obviously my problem as well.

Mainly, I use the TimeStamp to determine WHEN last the position of a flight 
was changed (and thus I know when and what change was made in the flight), 
whilst FlightTracker is used to "group" all the occurances of the changes in 
the flight.

 > This command would fit between the CREATE...SELECT... and the second
> SELECT... Even though the docs say that only 1 index is ever used per
> query, I believe that each JOIN can also make use of an index if it exists
> (this belief is based on the results of the EXPLAIN(s) of queries
> containing JOINed tables. However, I could be completely wrong in this
> belief.).
>

I was seriously hoping to be able to do this effeciently with one query. 
But what you are saying does make sense however - especially when the 
database gets bigger and we're sitting with 1GB of flight data for example. 
One question though, if we sit with (for example) with 1,000,000 records 
instead of the odd 350K records in the database now.  Will it not take just 
as long to populate the temp table?

> I do not think this technique is faster if you eliminate the WHERE clause
> from the first query. But if you do have a WHERE, it should return much
> faster than if you simply added a WHERE to your original query.

Ok.  That is something that I was not sure about either yes.  WHERE would 
obviously make the JOINs / GROUP BY smaller in regards to the amount of data 
examined and processed.  I thought perhaps LIMIT would do the same, in that 
it would use only the 30 records specified in my example query.  Obviously 
it JOIN / GROUP the entire table first, and only then return the first 30 
records - which is indeed a big difference.

mysql> SELECT FlightData.PilotCallSign AS FLNum,         FlightLogs.Tracker 
AS Tracker,         FlightLogs.TimeStart AS TimeDep,         Departure.IATA 
AS AirpDep,         Arrival.IATA AS AirpDes,         FlightLogs.Plane AS 
Plane    FROM FlightLogs    LEFT JOIN Airports AS Departure ON 
FlightLogs.AirpDep=Departure.ICAO    LEFT JOIN Airports AS Arrival ON 
FlightLogs.AirpDes=Arrival.ICAO    LEFT JOIN FlightData ON 
FlightLogs.Tracker=FlightData.Tracker   GROUP BY FlightData.Tracker   ORDER 
BY FlightLogs.TimeStart   LIMIT 30;
+---------+--------------------------------------+---------------------+---------+---------+----------+
| FLNum   | Tracker                              | TimeDep             | 
AirpDep | AirpDes | Plane    |
+---------+--------------------------------------+---------------------+---------+---------+----------+
<SNIP>
+---------+--------------------------------------+---------------------+---------+---------+----------+
30 rows in set (14.43 sec)

mysql> SELECT FlightData.PilotCallSign AS FLNum,         FlightLogs.Tracker 
AS Tracker,         FlightLogs.TimeStart AS TimeDep,         Departure.IATA 
AS AirpDep,         Arrival.IATA AS AirpDes,         FlightLogs.Plane AS 
Plane    FROM FlightLogs    LEFT JOIN Airports AS Departure ON 
FlightLogs.AirpDep=Departure.ICAO    LEFT JOIN Airports AS Arrival ON 
FlightLogs.AirpDes=Arrival.ICAO    LEFT JOIN FlightData ON 
FlightLogs.Tracker=FlightData.Tracker WHERE FlightLogs.CCID='927968' GROUP 
BY FlightData.Tracker   ORDER BY FlightLogs.TimeStart;
+--------+--------------------------------------+---------------------+---------+---------+--------+
| FLNum  | Tracker                              | TimeDep             | 
AirpDep | AirpDes | Plane  |
+--------+--------------------------------------+---------------------+---------+---------+--------+
<SNIP>
+--------+--------------------------------------+---------------------+---------+---------+--------+
2 rows in set (0.02 sec)

MAJOR difference indeed ;)  I'll check out the temp tables a bit later after 
I had some sleep.  I think I'll need it for what I'm going to be exploring 
here... hehehe  One last question... A sub select may not perhaps speed 
things up as well?

--
Chris.

Thread
LEFT JOINS same data twice?Chris Knipe13 Apr
  • Re: LEFT JOINS same data twice?Peter Brawley13 Apr
  • Re: LEFT JOINS same data twice?Rhino13 Apr
  • Re: LEFT JOINS same data twice?Roger Baklund13 Apr
  • Re: LEFT JOINS same data twice?Chris Knipe15 Apr
    • Re: LEFT JOINS same data twice?SGreen15 Apr
      • Re: LEFT JOINS same data twice?Peter Brawley15 Apr
      • Re: LEFT JOINS same data twice?Chris Knipe15 Apr
        • Re: LEFT JOINS same data twice?SGreen15 Apr
          • Re: LEFT JOINS same data twice?SGreen15 Apr