From: Peter Brawley Date: February 15 2005 7:04pm Subject: Re: join speed vs. 2 queries List-Archive: http://lists.mysql.com/mysql/179973 Message-Id: <421247D2.8050600@earthlink.net> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Matthew, >...is there a benefit to doing the INNER JOIN in the FROM clause >rather than creating a join condition in the WHERE clause? I rewrote >the query a bit using the WHERE join condition and noticed a slight >performance hit on this particular query... There are (at least) three benefits to putting JOINs in the FROM clause: clarity for you and anyone who reads it, maintainability by you or anyone who follows you, and you've removed one kind of guessing from the query engine's joblist. >Also, I am doing some pretty hefty multi-dimensional analysis on the data >such as getting all the hits for each of m-variables with n-values each within >a date range that also contain certain other varname/value combinations. >Now if I am doing multiple different queries on this data, using different fields >of the table in each one, is it preferable to create an index for each query, or >make one uber-index that can be used as a swiss-army knife for at least a >couple of them? I don't want to slow down INSERT calls if I can help it, >but I don't want my queries to take 5 minutes each either... If you can get one multi-col index to work for most queries, you're golden. Failing that ... >Perhaps the solution is two different databases, one without indexes for inserts, >one with indexes for pulling the data back out? Then I can use indexes when >I need to and don't have the performance hit when I need to insert data. ... an OLTP db for updates, an OLAP db for reports will often SYA. Yes, winnow down starting at the top. PB Mathew Ray wrote: > Many Thanks Peter, I appreciate your response. > > Played around with the indexes, and modified the query a bit more to > match the campaignId of the value first and got a 2000x performance > increase from the original query...now it takes .03 seconds on average > where it used to take 60. > > One question though: is there a benefit to doing the INNER JOIN in the > FROM clause rather than creating a join condition in the WHERE clause? > I rewrote the query a bit using the WHERE join condition and noticed a > slight performance hit on this particular query... > > Also, I am doing some pretty hefty multi-dimensional analysis on the > data such as getting all the hits for each of m-variables with > n-values each within a date range that also contain certain other > varname/value combinations. Now if I am doing multiple different > queries on this data, using different fields of the table in each > one, is it preferable to create an index for each query, or make one > uber-index that can be used as a swiss-army knife for at least a > couple of them? I don't want to slow down INSERT calls if I can help > it, but I don't want my queries to take 5 minutes each either... > > Perhaps the solution is two different databases, one without indexes > for inserts, one with indexes for pulling the data back out? Then I > can use indexes when I need to and don't have the performance hit when > I need to insert data. > > The last question I have regards EXPLAIN results... > Am I correct in believing that each row corresponds to a different > comparison in the WHERE clause, and that each row analysis is done on > each of the rows before it (associative)? So if rows is 10,10,10 for 3 > joins, then in actuality 1000 rows will be examined? If this is the > case, then I imagine it is better to winnow to the smallest possible > set first and then go from there trying to get as few rows examined as > theoretically possible given the constraints of the query. > > ~Mathew > > > > Peter Brawley wrote: > >> >I have a gut feeling that this kind of join should be able to be >> >done with similar speed without having to use a temp table >> >> Yep but remember the query engine uses one index per table so without >> seeing your EXPLAIN output I'd try indexing ... >> the data table on name_id,value_id,campaign_id, >> the names table on id,name, >> the values table on id,value, >> then writing the query as ... >> SELECT COUNT(*) >> FROM data >> INNER JOIN names ON data.name_id=names.id >> INNER JOIN values ON data.value_id=values.id >> WHERE data.campaign_id = 22 >> AND names.name = 'content' >> AND values.value = 'index' >> >> PB >> >> ----- >> Mathew Ray wrote: >> >>> Newbie on the list here having a bit of confusion at the moment why >>> an INNER JOIN is taking so long... I have replaced a few column >>> names to make it a bit more succinct: >>> >>> >>> SELECT COUNT(*) >>> FROM data, values, names >>> >>> WHERE data.campaign_id = 22 >>> AND names.name = 'content' >>> AND values.value = 'index' >>> >>> AND data.name_id = names.id >>> AND data.value_id = value.id; >>> >>> >>> This query should pull out all of the index content from the data >>> table for campaign 22. name_id and value_id are indexed, as are the >>> name and value fields of the names and values tables. campaign_id is >>> also indexed in the data table and each name and value is unique >>> per campaign. The vardata dataset for this campaign that has around >>> 163000 entries and the above query takes nearly a minute to run. >>> Total size of data table is around 3 million records. >>> >>> On the same machine, the following query takes roughly 2 seconds to >>> run: >>> >>> >>> CREATE TEMPORARY TABLE IF NOT EXISTS names_temp >>> SELECT names.id as var_id, values.id as val_id >>> FROM values, names >>> >>> WHERE names.campaign_id = 22 >>> AND values.campaign_id = names.campaign_id >>> >>> AND names.name = 'content' >>> AND values.value = 'index'; >>> >>> SELECT COUNT(*) >>> FROM vardata, names_temp >>> WHERE vardata.varNameId = names_temp.var_id >>> AND vardata.varValueId = names_temp.val_id; >>> >>> >>> After looking at EXPLAIN for both, I understand that the latter is >>> faster because it is doing lookups based on constant values, but I >>> have a gut feeling that this kind of join should be able to be done >>> with similar speed without having to use a temp table... Is there >>> any way to optimize the performance of the join query without having >>> to go with the two-query option? >>> >>> >>> >> >> > > > > -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.300 / Virus Database: 265.8.8 - Release Date: 2/14/2005