Hi.
On Tue, Aug 22, 2000 at 12:22:05PM -0700, dave-mlist@stripped wrote:
> I have a big table A, a big table B, and a little table C.
> I want to find the first 1000 rows in A that don't exist in B or C.
>
> I know how to find the first 1000 rows in A that don't exist in B:
>
> INSERT INTO D
> SELECT a.*
> FROM a NATURAL LEFT JOIN b
> WHERE b.some_id IS NULL
> ORDER BY a.time DESC
> LIMIT 1000;
>
> The above works fine for me. But now I have to take C into account.
> If I have to separate the B pass into an intermediate stage, I may
> have to keep repeating B passes until the C pass has enough rows in
> it. So, I would like to do just a single step to avoid having all
> that extra logic.
>
> Is the following correct if I also have C:
>
> INSERT INTO D
> SELECT a.*
> FROM a NATURAL LEFT JOIN b NATURAL LEFT JOIN c
> WHERE b.some_id IS NULL AND c.some_id IS NULL
> ORDER BY a.time DESC
> LIMIT 1000;
Yes, this should be correct. Why didn't you just try?
> What about this:
>
> INSERT INTO D
> SELECT a.*
> FROM a NATURAL LEFT JOIN b, c
> WHERE b.some_id IS NULL AND c.some_id IS NULL
> ORDER BY a.time DESC
> LIMIT 1000;
No, C.some_id would never be NULL (because it isn't a LEFT JOIN). Your
JOIN would get the rows which are in A and also C, with B.some_id
showing whether they appear in B. And the first part of the WHERE
clause would throw away all rows which exist in B, and the second part
would throw away all the rest, because there are no columns where
C.some_id is NULL.
Hm. Again, why didn't you test this? Maybe my 'theory' has a bug, but
if I am not wrong, a simple test would have shown you, that the result
set you get is empty and therefore not what you want.
Bye,
Benjamin.