List:General Discussion« Previous MessageNext Message »
From:Peter Brawley Date:February 15 2005 7:04pm
Subject:Re: join speed vs. 2 queries
View as plain text  
Matthew,

 >...is there a benefit to doing the INNER JOIN in the FROM clause
 >rather than creating a join condition in the WHERE clause? I rewrote
 >the query a bit using the WHERE join condition and noticed a slight
 >performance hit on this particular query...

There are (at least) three benefits to putting JOINs in the FROM clause: 
clarity for you and anyone who reads it, maintainability by you or 
anyone who follows you, and you've removed one kind of guessing from the 
query engine's joblist.

 >Also, I am doing some pretty hefty multi-dimensional analysis on the data
 >such as getting all the hits for each of m-variables with n-values 
each within
 >a date range that also contain certain other varname/value combinations.
 >Now if I am doing multiple different queries on this data,  using 
different fields
 >of the table in each one, is it preferable to create an index for each 
query, or
 >make one uber-index that can be used as a swiss-army knife for at least a
 >couple of them? I don't want to slow down INSERT calls if I can help it,
 >but I don't want my queries to take 5 minutes each either...

If you can get one multi-col index to work for most queries, you're 
golden. Failing that  ...

 >Perhaps the solution is two different databases, one without indexes 
for inserts,
 >one with indexes for pulling the data back out? Then I can use indexes 
when
 >I need to and don't have the performance hit when I need to insert data.

... an OLTP db for updates, an OLAP db for reports will often SYA.

Yes, winnow down starting at the top.

PB

Mathew Ray wrote:

> Many Thanks Peter, I appreciate your response.
>
> Played around with the indexes, and modified the query a bit more to 
> match the campaignId of the value first and got a 2000x performance 
> increase from the original query...now it takes .03 seconds on average 
> where it used to take 60.
>
> One question though: is there a benefit to doing the INNER JOIN in the 
> FROM clause rather than creating a join condition in the WHERE clause? 
> I rewrote the query a bit using the WHERE join condition and noticed a 
> slight performance hit on this particular query...
>
> Also, I am doing some pretty hefty multi-dimensional analysis on the 
> data such as getting all the hits for each of m-variables with 
> n-values each within a date range that also contain certain other 
> varname/value combinations. Now if I am doing multiple different 
> queries on this data,  using different fields of the table in each 
> one, is it preferable to create an index for each query, or make one 
> uber-index that can be used as a swiss-army knife for at least a 
> couple of them? I don't want to slow down INSERT calls if I can help 
> it, but I don't want my queries to take 5 minutes each either...
>
> Perhaps the solution is two different databases, one without indexes 
> for inserts, one with indexes for pulling the data back out? Then I 
> can use indexes when I need to and don't have the performance hit when 
> I need to insert data.
>
> The last question I have regards EXPLAIN results...
> Am I correct in believing that each row corresponds to a different 
> comparison in the WHERE clause, and that each row analysis is done on 
> each of the rows before it (associative)? So if rows is 10,10,10 for 3 
> joins, then in actuality 1000 rows will be examined? If this is the 
> case, then I imagine it is better to winnow to the smallest possible 
> set first and then go from there trying to get as few rows examined as 
> theoretically possible given the constraints of the query.
>
> ~Mathew
>
>
>
> Peter Brawley wrote:
>
>>  >I have  a gut feeling that this kind of join should be able to be
>>  >done with similar speed without having to use a temp table
>>
>> Yep but remember the query engine uses one index per table so without 
>> seeing your EXPLAIN output I'd try indexing ...
>>   the data table on name_id,value_id,campaign_id,
>>   the names table on id,name,
>>   the values table on id,value,
>> then writing the query as ...
>>  SELECT COUNT(*)
>>   FROM data
>>    INNER JOIN names ON data.name_id=names.id
>>    INNER JOIN values ON data.value_id=values.id
>>  WHERE data.campaign_id = 22
>>    AND names.name = 'content'
>>    AND values.value = 'index'
>>
>> PB
>>
>> -----
>> Mathew Ray wrote:
>>
>>> Newbie on the list here having a bit of confusion at the moment why 
>>> an INNER JOIN is taking so long... I have replaced a few column 
>>> names to make it a bit more succinct:
>>>
>>>
>>> SELECT COUNT(*)
>>> FROM data, values, names
>>>
>>> WHERE data.campaign_id = 22
>>> AND names.name = 'content'
>>> AND values.value = 'index'
>>>
>>> AND data.name_id = names.id
>>> AND data.value_id = value.id;
>>>
>>>
>>> This query should pull out all of the index content from the data 
>>> table for campaign 22. name_id and value_id are indexed, as are the 
>>> name and value fields of the names and values tables. campaign_id is 
>>> also indexed  in the data table and each name and value is unique 
>>> per campaign. The vardata dataset for this campaign that has around 
>>> 163000 entries and the above query takes nearly a minute to run. 
>>> Total size of data table is around 3 million records.
>>>
>>> On the same machine, the following query takes roughly 2 seconds to 
>>> run:
>>>
>>>
>>> CREATE TEMPORARY TABLE IF NOT EXISTS names_temp
>>> SELECT names.id as var_id, values.id as val_id
>>> FROM values, names
>>>
>>> WHERE names.campaign_id = 22
>>> AND values.campaign_id = names.campaign_id
>>>
>>> AND names.name = 'content'
>>> AND values.value = 'index';
>>>
>>> SELECT COUNT(*)
>>> FROM vardata, names_temp
>>> WHERE vardata.varNameId = names_temp.var_id
>>> AND vardata.varValueId = names_temp.val_id;
>>>
>>>
>>> After looking at EXPLAIN for both, I understand that the latter is 
>>> faster because it is doing lookups based on constant values, but I 
>>> have  a gut feeling that this kind of join should be able to be done 
>>> with similar speed without having to use a temp table... Is there 
>>> any way to optimize the performance of the join query without having 
>>> to go with the two-query option?
>>>
>>>
>>>
>>
>>
>
>
>
>


-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.8.8 - Release Date: 2/14/2005

Thread
join speed vs. 2 queriesMathew Ray14 Feb
  • Re: join speed vs. 2 queriesPeter Brawley15 Feb
    • Re: join speed vs. 2 queriesMathew Ray15 Feb
      • Re: join speed vs. 2 queriesPeter Brawley15 Feb
  • timezone questionsElim Qiu15 Feb