Many Thanks Peter, I appreciate your response.
Played around with the indexes, and modified the query a bit more to
match the campaignId of the value first and got a 2000x performance
increase from the original query...now it takes .03 seconds on average
where it used to take 60.
One question though: is there a benefit to doing the INNER JOIN in the
FROM clause rather than creating a join condition in the WHERE clause? I
rewrote the query a bit using the WHERE join condition and noticed a
slight performance hit on this particular query...
Also, I am doing some pretty hefty multi-dimensional analysis on the
data such as getting all the hits for each of m-variables with n-values
each within a date range that also contain certain other varname/value
combinations. Now if I am doing multiple different queries on this data,
using different fields of the table in each one, is it preferable to
create an index for each query, or make one uber-index that can be used
as a swiss-army knife for at least a couple of them? I don't want to
slow down INSERT calls if I can help it, but I don't want my queries to
take 5 minutes each either...
Perhaps the solution is two different databases, one without indexes for
inserts, one with indexes for pulling the data back out? Then I can use
indexes when I need to and don't have the performance hit when I need to
insert data.
The last question I have regards EXPLAIN results...
Am I correct in believing that each row corresponds to a different
comparison in the WHERE clause, and that each row analysis is done on
each of the rows before it (associative)? So if rows is 10,10,10 for 3
joins, then in actuality 1000 rows will be examined? If this is the
case, then I imagine it is better to winnow to the smallest possible set
first and then go from there trying to get as few rows examined as
theoretically possible given the constraints of the query.
~Mathew
Peter Brawley wrote:
> >I have a gut feeling that this kind of join should be able to be
> >done with similar speed without having to use a temp table
>
> Yep but remember the query engine uses one index per table so without
> seeing your EXPLAIN output I'd try indexing ...
> the data table on name_id,value_id,campaign_id,
> the names table on id,name,
> the values table on id,value,
> then writing the query as ...
> SELECT COUNT(*)
> FROM data
> INNER JOIN names ON data.name_id=names.id
> INNER JOIN values ON data.value_id=values.id
> WHERE data.campaign_id = 22
> AND names.name = 'content'
> AND values.value = 'index'
>
> PB
>
> -----
> Mathew Ray wrote:
>
>> Newbie on the list here having a bit of confusion at the moment why an
>> INNER JOIN is taking so long... I have replaced a few column names to
>> make it a bit more succinct:
>>
>>
>> SELECT COUNT(*)
>> FROM data, values, names
>>
>> WHERE data.campaign_id = 22
>> AND names.name = 'content'
>> AND values.value = 'index'
>>
>> AND data.name_id = names.id
>> AND data.value_id = value.id;
>>
>>
>> This query should pull out all of the index content from the data
>> table for campaign 22. name_id and value_id are indexed, as are the
>> name and value fields of the names and values tables. campaign_id is
>> also indexed in the data table and each name and value is unique per
>> campaign. The vardata dataset for this campaign that has around 163000
>> entries and the above query takes nearly a minute to run. Total size
>> of data table is around 3 million records.
>>
>> On the same machine, the following query takes roughly 2 seconds to run:
>>
>>
>> CREATE TEMPORARY TABLE IF NOT EXISTS names_temp
>> SELECT names.id as var_id, values.id as val_id
>> FROM values, names
>>
>> WHERE names.campaign_id = 22
>> AND values.campaign_id = names.campaign_id
>>
>> AND names.name = 'content'
>> AND values.value = 'index';
>>
>> SELECT COUNT(*)
>> FROM vardata, names_temp
>> WHERE vardata.varNameId = names_temp.var_id
>> AND vardata.varValueId = names_temp.val_id;
>>
>>
>> After looking at EXPLAIN for both, I understand that the latter is
>> faster because it is doing lookups based on constant values, but I
>> have a gut feeling that this kind of join should be able to be done
>> with similar speed without having to use a temp table... Is there any
>> way to optimize the performance of the join query without having to go
>> with the two-query option?
>>
>>
>>
>
>