I don't think it will be any better to count distinct values. I think
the query is just slow because the index lookups are slow. Is the
'word' column really 150 bytes? That's probably the culprit. How slow
is this, by the way? 370k rows in one table, verifying the
non-existence of index records in a 4M-row table with 150-byte index
values... what does "slow" mean for your application? How big is the
index for the 4M-row table (use SHOW TABLE STATUS)?
Russell Uman wrote:
>
> There's no "using distinct", but there is "not exists", and in fact no
> rows are
> returned. Slow query log reports "#Query_time: 94 Lock_time: 0
> Rows_sent: 0
> Rows_examined: 370220"
>
> EXPLAIN:
> id select_type table type possible_keys key
> key_len ref rows Extra
> 1 SIMPLE t1 index NULL PRIMARY 150 NULL
> 338451 Using index
> 1 SIMPLE t2 ref word word 150
> t2.field 4
> Using where; Using index; Not exists
>
> These are two search tables (hence the large key_len i believe), one
> with ~400K
> rows, one row per search term the other with ~4M rows, relating search
> terms to
> content.
>
> Perhaps I could optimize by doing a count(distinct) on each table and only
> running the expensive query if the counts don't match?
>
> Would I see any benefit by making these InnoDB tables?
>
> Thanks for your help with this!
>
> Baron Schwartz wrote:
>> Hi,
>>
>> That is the right way, but if you show us the exact output of EXPLAIN
>> we can
> help more. In particular, does it say "Using distinct/not exists" in
> Extra?
>>
>> Russell Uman wrote:
>>>
>>> howdy.
>>>
>>> i trying to find items in one table that don't exist in another.
>>> i'm using a left join with a where clause to do it:
>>>
>>> SELECT t1.field, t2.field FROM table1 t1 LEFT JOIN table2 t2 ON
>>> t1.word = t2.word WHERE t2.word IS NULL;
>>>
>>> both tables are quite large and the query is quite slow.
>>>
>>> the field column is indexed in both tables, and explain shows the
>>> indexes being used.
>>>
>>> is there a better way to construct this kind of query?
>