List:General Discussion« Previous MessageNext Message »
From:Mathew Ray Date:February 15 2005 5:23pm
Subject:Re: join speed vs. 2 queries
View as plain text  
Many Thanks Peter, I appreciate your response.

Played around with the indexes, and modified the query a bit more to 
match the campaignId of the value first and got a 2000x performance 
increase from the original query...now it takes .03 seconds on average 
where it used to take 60.

One question though: is there a benefit to doing the INNER JOIN in the 
FROM clause rather than creating a join condition in the WHERE clause? I 
rewrote the query a bit using the WHERE join condition and noticed a 
slight performance hit on this particular query...

Also, I am doing some pretty hefty multi-dimensional analysis on the 
data such as getting all the hits for each of m-variables with n-values 
each within a date range that also contain certain other varname/value 
combinations. Now if I am doing multiple different queries on this data, 
  using different fields of the table in each one, is it preferable to 
create an index for each query, or make one uber-index that can be used 
as a swiss-army knife for at least a couple of them? I don't want to 
slow down INSERT calls if I can help it, but I don't want my queries to 
take 5 minutes each either...

Perhaps the solution is two different databases, one without indexes for 
inserts, one with indexes for pulling the data back out? Then I can use 
indexes when I need to and don't have the performance hit when I need to 
insert data.

The last question I have regards EXPLAIN results...
Am I correct in believing that each row corresponds to a different 
comparison in the WHERE clause, and that each row analysis is done on 
each of the rows before it (associative)? So if rows is 10,10,10 for 3 
joins, then in actuality 1000 rows will be examined? If this is the 
case, then I imagine it is better to winnow to the smallest possible set 
first and then go from there trying to get as few rows examined as 
theoretically possible given the constraints of the query.

~Mathew



Peter Brawley wrote:
>  >I have  a gut feeling that this kind of join should be able to be
>  >done with similar speed without having to use a temp table
> 
> Yep but remember the query engine uses one index per table so without 
> seeing your EXPLAIN output I'd try indexing ...
>   the data table on name_id,value_id,campaign_id,
>   the names table on id,name,
>   the values table on id,value,
> then writing the query as ...
>  SELECT COUNT(*)
>   FROM data
>    INNER JOIN names ON data.name_id=names.id
>    INNER JOIN values ON data.value_id=values.id
>  WHERE data.campaign_id = 22
>    AND names.name = 'content'
>    AND values.value = 'index'
> 
> PB
> 
> -----
> Mathew Ray wrote:
> 
>> Newbie on the list here having a bit of confusion at the moment why an 
>> INNER JOIN is taking so long... I have replaced a few column names to 
>> make it a bit more succinct:
>>
>>
>> SELECT COUNT(*)
>> FROM data, values, names
>>
>> WHERE data.campaign_id = 22
>> AND names.name = 'content'
>> AND values.value = 'index'
>>
>> AND data.name_id = names.id
>> AND data.value_id = value.id;
>>
>>
>> This query should pull out all of the index content from the data 
>> table for campaign 22. name_id and value_id are indexed, as are the 
>> name and value fields of the names and values tables. campaign_id is 
>> also indexed  in the data table and each name and value is unique per 
>> campaign. The vardata dataset for this campaign that has around 163000 
>> entries and the above query takes nearly a minute to run. Total size 
>> of data table is around 3 million records.
>>
>> On the same machine, the following query takes roughly 2 seconds to run:
>>
>>
>> CREATE TEMPORARY TABLE IF NOT EXISTS names_temp
>> SELECT names.id as var_id, values.id as val_id
>> FROM values, names
>>
>> WHERE names.campaign_id = 22
>> AND values.campaign_id = names.campaign_id
>>
>> AND names.name = 'content'
>> AND values.value = 'index';
>>
>> SELECT COUNT(*)
>> FROM vardata, names_temp
>> WHERE vardata.varNameId = names_temp.var_id
>> AND vardata.varValueId = names_temp.val_id;
>>
>>
>> After looking at EXPLAIN for both, I understand that the latter is 
>> faster because it is doing lookups based on constant values, but I 
>> have  a gut feeling that this kind of join should be able to be done 
>> with similar speed without having to use a temp table... Is there any 
>> way to optimize the performance of the join query without having to go 
>> with the two-query option?
>>
>>
>>
> 
> 


Thread
join speed vs. 2 queriesMathew Ray14 Feb
  • Re: join speed vs. 2 queriesPeter Brawley15 Feb
    • Re: join speed vs. 2 queriesMathew Ray15 Feb
      • Re: join speed vs. 2 queriesPeter Brawley15 Feb
  • timezone questionsElim Qiu15 Feb