From: Date: April 16 2008 1:19pm Subject: Re: select does too much work to find rows where primary key does not match List-Archive: http://lists.mysql.com/mysql/212315 Message-Id: <4805E0C9.6060400@mysql.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: quoted-printable Hi Patrick, all ! Patrick J. McEvoy wrote: > I have two MyISAM tables; each uses 'phone' as a primary key. Finding=20 > rows where the primary keys match is efficient: >=20 > mysql> explain select bar.phone from foo,bar where foo.phone=3Dbar.phon= e; > [[...]} Ok, let us take some simple example. Say tables "foo" and "bar" both=20 have three rows each, with "phone" values 1, 2, and 3. Then your matching lines will be: foo.phone bar.phone 1 1 2 2 3 3 Column foo.phone is shown for explanation only, your select would not=20 return it. Each individual value in bar.phone will be returned as often as there is = an identical value in foo.phone. I trust that is close to what you expect. My sample data here do not show what will happen if values in foo.phone=20 are not distinct - figure yourself. >=20 >=20 > Finding rows in one table that do not match a row in the other table is= =20 > wildly inefficient: >=20 > mysql> explain select bar.phone from foo,bar where foo.phone!=3Dbar.pho= ne; > [[...]] Your resulting data from this select would be: foo.phone bar.phone 1 2 1 3 2 1 2 3 3 1 3 2 Again, foo.phone is shown for explanation only. >=20 > (This is the same for 'NOT', '!=3D', or '<>'.) Correct. >=20 > The amount of work should be identical in both cases: grab a row, look = > up by primary key in the other table, proceed. No, it isn't: A "select ... from foo, bar where CONDITION" effectively creates the=20 cartesian product of both tables and then removes all lines=20 (combinations) which do not meet the condition. Of course, the system uses better strategies if possible, evaluating=20 indexes etc, but the resulting data will be the same. Assuming tables "foo" and "bar" each have a column "num" with the values = 1 to 100, a condition "... where foo.num =3D bar.num" will lead to a=20 result with 100 rows. But "... where foo.num !=3D bar.num" will lead to a table of 9,900 rows: For each of the 100 values in "foo.num", there will be 99 entries in=20 "bar.num" that satisfy the inequality condition. >=20 > My real goal is to delete rows in the smaller table if there is no matc= h=20 > in the larger table: >=20 > delete from bar using foo,bar where not bar.phone=3Dfoo.phone; See above - wrong approach. What you need is a subquery or an outer join, as proposed in the other=20 replies. >=20 > [[...]] HTH, J=F6rg --=20 Joerg Bruehe, Senior Production Engineer MySQL AB, www.mysql.com