sam wun <sam.wun@stripped> wrote on 01/20/2005 11:45:40 AM:
> SGreen@stripped wrote:
> > SELECT DISTINCT i.basename
> > FROM inventory i
> > INNER JOIN transaction t
> > ON i.prodcode = t.prodcode
> > AND t.date >= '2004-01-01'
> > AND t.date <= '2004-01-31'
> > INNER JOIN transaction tt
> > ON i.prodcode = tt.prodcode
> > AND tt.date >= '2005-01-01'
> > AND tt.date <= '2005-01-31'
> > INNER JOIN customer c
> > ON c.custcode = t.custcode
> > AND c.custcode = tt.custcode
> > WHERE i.basename is not NULL
> > and i.prodname is not NULL
> > order by i.basename
> > This should give you a list of inventory basenames for all current
> > customers (their names are still in the customer table) that "had
> > transactions" (ordered products?) during both JAN 2004 and JAN 2005.
> > This list will show only the products that were ordered during BOTH
> > time periods BY THE SAME CUSTOMER at least once (at least one repeat
> > sale, year to year, in JAN). Is this what you were after or was there
> > a different question you were trying to answer?
> Hi, while this is much faster than subquery, I found there is "cumsy"
> way faster than this series inner joins.
> I dicovered that if I create two different temporary tables with "create
> view as select ..." for 2 differnet period, the join between this temp
> tables is also much faster than this series inner joins approach. With
> 160000 records in the transaction table, the series inner joins takes
> very long time (more than 15 minutes) give out a result.
> Any comment?
> > Shawn Green
> > Database Administrator
> > Unimin Corporation - Spruce Pine
Once you start getting into larger sets of data (millions of rows), Taking
snapshots (as you did with your CREATE TABLE ... SELECT statements) can
provide some significant performance boosts.
Here is a numerical analysis of the two situations:
Assume we have a table with 1000000 (1Millon = 1e6) rows in it that needs
to be joined to itself for a query (much as in the query above). Computing
the JOIN of two tables that each contain 1e6 rows could result in a
virtual table containing up to 1e12 rows. The actual number of rows in a
JOIN will be smaller than the pure Cartesian product because it will be
limited to only those rows match your ON conditions. However the number
1e12 represents how many permutations of data the engine must decide
between in order to calculate the results of the JOIN.
Let's imagine that instead of needing to join the entire table to itself,
that we only need to match 10000 (1e4) rows of the table's data against
another set of 10000 rows (two nice round figures that could stand in for
the actual number of "transaction" records during January of each year in
our original query data). A JOIN between those two sets of rows would
result in a maximum Cartesian product of only 1e8 rows.
If it takes a fixed length of time (not a variable length of time) for the
query engine to decide if any one row of a JOIN's Cartesian product
belongs to the final JOIN results (based on evaluating the ON conditions
against each row combination) then the subset JOIN will reduce the time it
takes to compute the virtual table by a factor of 10000 (1e12/1e8 = 1e4).
That means that the query engine made 999,999,990,000 fewer comparisons to
build the virtual table based on the JOINed sub-tables than it would need
to build the same table based on the whole table joined to itself.
Your results seem to support this analysis. The time you spent creating
the subsets of data (time to select rows + time to write them into a
table, twice) was much, MUCH less than the time it would take to sort
through all of the other possible combinations of data in your original
table that didn't need to participate in your analysis.
As I said at the beginning, this is a query optimization technique for
certain queries against "larger" datasets. For queries that deal with
tables on the order of a few hundred thousand rows or less, it may not
apply. Your mileage will vary. Only actual testing will reveal the actual
performance of one technique over the other.
Unimin Corporation - Spruce Pine