List:General Discussion« Previous MessageNext Message »
From:Brent Baisley Date:July 12 2006 1:53pm
Subject:Re: Self Join Performance
View as plain text  
Interesting setup. You're using one more join than you need to. Your query should look
like this:
SELECT
 DISTINCT(fvr.DocumentID)
FROM
 FieldValueRelation fvr
INNER JOIN
 FieldValueRelation fvr2
 ON  fvr.DocumentID = fvr2.DocumentID
 AND fvr2.FieldValueID = '1569'
WHERE  fvr1.FieldValueID = '1344'

Your query was so slow because you were first selecting ALL the records from
FieldValueRelation, then using a join to filter out the 
records. You should be using a WHERE clause to filter out the first data set.

----- Original Message ----- 
From: "Andrew Wood" <andrew.d.wood@stripped>
To: <mysql@stripped>
Sent: Tuesday, July 11, 2006 9:50 PM
Subject: Self Join Performance


> Hello everyone!
>
> I've got a few questions regarding optimizing self-joins.
> So I've got these three tables:
>
> mysql> describe FieldName;
> +-------------+------------+------+-----+---------+----------------+
> | Field       | Type       | Null | Key | Default | Extra          |
> +-------------+------------+------+-----+---------+----------------+
> | FieldNameID | bigint(20) | NO   | PRI | NULL    | auto_increment |
> | Name        | char(255)  | YES  |     | NULL    |                |
> +-------------+------------+------+-----+---------+----------------+
> 2 rows in set (0.02 sec)
>
> mysql> describe FieldValue;
> +--------------+------------+------+-----+---------+----------------+
> | Field        | Type       | Null | Key | Default | Extra          |
> +--------------+------------+------+-----+---------+----------------+
> | FieldValueID | bigint(20) | NO   | PRI | NULL    | auto_increment |
> | FieldNameID  | bigint(20) | NO   | MUL | NULL    |                |
> | Value        | char(255)  | YES  | MUL | NULL    |                |
> +--------------+------------+------+-----+---------+----------------+
> 3 rows in set (0.00 sec)
>
> mysql> describe FieldValueRelation;
> +--------------+-------------+------+-----+---------+-------+
> | Field        | Type        | Null | Key | Default | Extra |
> +--------------+-------------+------+-----+---------+-------+
> | FieldValueID | bigint(20)  | NO   | MUL | NULL    |       |
> | DocumentID   | bigint(20)  | NO   | MUL | NULL    |       |
> | FieldSetID   | tinyint(11) | YES  |     | NULL    |       |
> +--------------+-------------+------+-----+---------+-------+
> 3 rows in set (0.00 sec)
>
>
> I've ran a script to populate the database with a decent set of
> randomized data (~7 million DocumentIDs).  The following query may
> clear things up a bit:
>
>
> mysql> select * from FieldName, FieldValue, FieldValueRelation where
> DocumentID = '7000000' and FieldValue.FieldValueID =
> FieldValueRelation.FieldValueID and FieldName.FieldNameID =
> FieldValue.FieldNameID;
>
> +-------------+-----------+--------------+-------------+----------+--------------+------------+------------+
> | FieldNameID | Name      | FieldValueID | FieldNameID | Value    |
> FieldValueID | DocumentID | FieldSetID |
>
> +-------------+-----------+--------------+-------------+----------+--------------+------------+------------+
> |           1 | Account   |      6737063 |           1 | 88116010 |
>  6737063 |    7000000 |       NULL |
> |           2 | FirstName |         1344 |           2 | Noelle   |
>     1344 |    7000000 |          1 |
> |           3 | LastName  |         1569 |           3 | Shea     |
>     1569 |    7000000 |          1 |
>
> +-------------+-----------+--------------+-------------+----------+--------------+------------+------------+
> 3 rows in set (0.00 sec)
>
>
>
> So here's my question:  I want to be able to find all of the rows in
> the FieldValueRelation table that have both a FirstName of 'Noelle'
> and a LastName of 'Shea'.  I've tried a few things, all of which have
> less than stellar performance.  I'm guesstimating that I'll have to do
> a self join on the FieldValueRelation table. Alas, it's pretty darn
> slow:
>
> SELECT
> DISTINCT(fvr.DocumentID)
> FROM
> FieldValueRelation fvr
> INNER JOIN
> FieldValueRelation fvr1
> ON fvr.DocumentID = fvr1.DocumentID
> AND fvr1.FieldValueID = '1344'
> INNER JOIN
> FieldValueRelation fvr2
> ON  fvr.DocumentID = fvr2.DocumentID
> AND fvr2.FieldValueID = '1569'
>
> And when I execute it:
>
> +------------+
> | DocumentID |
> +------------+
> |     162955 |
> |     721704 |
> |     993290 |
> |    1606157 |
> |    2459823 |
> |    2759626 |
> |    3949779 |
> |    5192230 |
> |    5753563 |
> |    6616602 |
> |    7000000 |
> +------------+
> 11 rows in set (37.33 sec)
>
> 37 seconds is a bit long for what I'd like to use this schema for. So
> here's the counts for the matching FieldValueIDs (if this helps):
>
> mysql> select count(*) from FieldValueRelation where FieldValueID = '1569';
> +----------+
> | count(*) |
> +----------+
> |     7753 |
> +----------+
> 1 row in set (0.05 sec)
>
> mysql> select count(*) from FieldValueRelation where FieldValueID = '1344';
> +----------+
> | count(*) |
> +----------+
> |     7357 |
> +----------+
> 1 row in set (0.01 sec)
>
>
> Can anyone shed any light on this?
>
> --
> Cheers,
> -Andrew
>
>
> All generalizations, with the possible exception of this one, are false.
> - Kurt Goedel
>
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql?unsub=1
> 

Thread
Self Join PerformanceAndrew Wood12 Jul
  • Re: Self Join PerformanceBrent Baisley12 Jul