Thanks a lot Shawn. As always, your advice has been very helpful.
On 2/3/06, SGreen@stripped <SGreen@stripped> wrote:
> Scott Klarenbach <doyouunderstand@stripped> wrote on 02/02/2006 02:01:11
> > I have a table `requirement` which is left joining to a table
> > based on a matching `partNumber` column. The inventory table has
> > of records, the requirement table has tens of thousands. I'm noticing
> > the left join between requirement and inventory doesn't take advantage
> of a
> > LIMIT clause. So whether I select all records from requirement or limit
> > to 50, the LEFT JOIN operation still seems to be calculating for ALL
> > requirement records against ALL inventory records. (The query takes the
> > exact same amount of time, whether I pull 50 requirement records or
> > How can I force mysql to only join the inventory table for the those 50
> > records brought back by the LIMIT clause?
> > What I would do in a more powerful DB like SQL Server, is build a
> > table with my 50 requirement rows, and then perform the inventory join
> > the temp table. But due to MySQL SPROC limitations (ie, LIMIT clauses
> > have integer constants, not parameters) and View limititations (ie, no
> > indexing of views), I'd have to build this temporary table and the rest
> > query in PHP first, which is really ugly.
> > I'm hoping there is a nice SQL trick I can use with MySQL to restrict
> > join to only those records that would come back from the limit set.
> > Thanks,
> > Scott Klarenbach
> Yes, and no. You cannot apply a LIMIT specifically to a JOIN clause
> unless you break your query into separate pieces and put limits on each of
> them. What happens during the normal execution of a query is that after
> parsing and planning the engine begins collecting and combining the source
> data. Which records are combined and matched against which others is defined
> in the FROM clause and all of the JOIN clauses.
> The equivalent to a large virtual table (similar to saying "SELECT * FROM
> <all involved tables>") is created in memory. The only restrictions to which
> rows of data make it into this first processing stage come from the ON
> clauses (and any WHERE clauses the optimizer _may_ choose to include)
> defined between the JOINed tables. Next comes WHERE clause processing, then
> GROUP BY processing, HAVING processing, ORDER BY processing, and finally
> LIMIT processing.
> As you can see by the flow of query execution, LIMIT clauses are really
> only useful for restricting how much data is finally sent to the user. In
> order to minimize how much processing your CPU has to do to compute a
> particular query you have several tools at your disposal: indexes, temporary
> tables, and stepwize result construction.
> JOINing tables is a geometrically expensive action. The number of
> potential row matches increase by the product of the number of rows in each
> table involved in the join. If you can preselect certain target rows from
> your really large tables into smaller temporary tables and build your final
> result set from them, the query processor will only need to compute a small
> fraction of the row comparisons it would have had to perform compared to the
> number of row comparisons necessary to JOIN your original tables. Take this
> rough math as an example:
> TABLE A: 10000 rows
> TABLE B: 10000 rows
> SELECT * from A INNER JOIN B ON A.id <http://a.id/> = B.A_ic;
> There are potentially 10000 x 10000 = 100000000 (1.0e+08) row combinations
> to be checked. If instead of joining A to B, we create two derivative tables
> called C and D (assuming we don't change the column names)
> TABLE A -> TABLE C: 5000 rows
> TABLE B -> TABLE D: 1000 rows
> SELECT * from C INNER JOIN D ON C.id <http://c.id/> = D.A_ic;
> That means there are now 5000 x 1000 = 5000000 (5.0e+06) or 1/20th the
> number of comparisons to run. Computing tables C and D should be in linear
> or logarithmic time (because you should have good index coverage) so there
> will usually be a net gain in performance. This is the secret to stepwize
> result construction.
> To help you to optimize your particular query, I would need to see it and
> the table definitions it is working against (SHOW CREATE TABLE works best
> for me).
> Shawn Green
> Database Administrator
> Unimin Corporation - Spruce Pine