MySQL Lists are EOL. Please join:

List:General Discussion« Previous MessageNext Message »
From:Mathew Ray Date:February 17 2005 2:49pm
Subject:Re: select where multiple joined records match
View as plain text  
I am guessing the long duration is caused by having to do complete table 
scans. How big is your dataset?

What about creating another index in resource_goals that includes 
GoalNumber and ResourceID? Perhaps even Subject, Grade, and 
NumericGrade.... As I learned just a couple days ago, making sure your 
JOIN conditions and WHERE clause can refer to an index can speed up 
queries 1000x or more.

Try tacking an EXPLAIN before your select and see how many rows MySQL 
things are being examined... optimally these should be very low, which 
indexes may be able to help with.

Thanks,
Mathew

AM Thomas wrote:
> Hi there,
> 
> I'll answer your questions below, but I should tell you that it looks 
> like  even three or four "joins" seem to make this too slow.  
> Theoretically, the  user could select up to 15 or so GoalNumber values, 
> 12 for Grade, and  about 20 possibilities for Subject - clearly it 
> wouldn't be useful to  specify that many items in a search, but that's 
> the theoretical maximum; 4  or 5 values isn't unreasonable, though.
> 
> Four "AND"ed goal numbers plus a subject and a grade slowed the search 
> (on  the shared commercial web host I'm using) into the 3 minute range, 
> and  that's with a regular join, not a left join.  This is the SELECT 
> that took  about 3 minutes (3 trials, simplifying slightly each time, 
> simplest given  here):
> 
> select r.id from resources as r
>   join resource_goals as g0 on (r.id=g0.ResourceID)
>   join resource_goals as g1 on (r.id=g1.ResourceID and g1.GoalNumber=1)
>   join resource_goals as g2 on (r.id=g2.ResourceID and g2.GoalNumber=2)
>   join resource_goals as g3 on (r.id=g3.ResourceID and g3.GoalNumber=3)
>   join resource_goals as g4 on (r.id=g4.ResourceID and g4.GoalNumber=4)
> where ((g0.Subject='Social_Studies')
>   and (g0.Grade='4th'))
> group by r.id;
> 
> The fastest time was 2 min 48 sec.
> Last time (simplest query) was 3 min 2 sec.
> 
> 
> I'm really running out of time on this project, so I just went ahead 
> and  made the user interface such that users can only select one 
> subject,  grade, and/or goal number at a time.  It's probably a sound 

> decision from  a usability perspective, so I'm not too sad.  If I decide 
> to make this  work in the future, I'd probably have to just do a SELECT 
> for each "AND"ed  field, get the list of resource id's for each SELECT, 
> then find the  intersection of the lists in Perl.
> 
> If I could speed this up with some kind of indexing, I'd love to know  
> about it.
> 
> The GROUP BY phrase is because I wanted just one row per resource.  It  
> seemed like I'd get a row for each condition/resource (didn't test it 
> with  the final ).  I'm actually doing SELECT * FROM... in my code, and 
> not  using the resource_goals information in my output (that's a 
> separate view  at present, generated by different Perl code).
> 
> It seems to work fine without the NOT NULL parts, you're right.  I was  
> wondering about that, but was sleepy enough at the time that I didn't  
> trust my thinking.
> 
> Sorry about not including my table defs :-(  .  I guess I was just 
> hoping  for a general approach, and didn't realize that anyone would be 
> interested  enough to read all that detail and provide and exact 
> solution for me.  Of  course, now I realize that it would have 
> simplified our discussion.   Anyway, late but not never, and for help to 
> whoever finds this in the list  archives someday, here are my table defs 
> (you're right - the goal number  is a TINYINT):
> 
> (Below is an abridged version of the resources table ; it also contains  
> about 60 more TINYINT fields which are essentially used as booleans, 
> some  of which I hope to eliminate.  Yes, I could have used SET or 
> something,  but I didn't for various reasons.)
> 
> CREATE TABLE resources (
>   id INT UNSIGNED PRIMARY KEY,
>   Title TEXT,
>   ResourceType_THJHArticle TINYINT,
>   ResourceType_NIEArticle TINYINT,
>   DataEntryName TINYTEXT,
>   Date DATETIME,
>   Notes TEXT,
>   Made_Keywords TEXT);
> 
> CREATE TABLE  resource_goals (
>   goal_id INT UNSIGNED PRIMARY KEY,
>   ResourceID INT,
>   Grade TINYTEXT,
>   Subject TINYTEXT,
>   GoalNumber TINYINT,
>   NumericGrade TINYINT);
> 
> 
> Thanks a bunch for your help; I'm finding this more interesting than I  
> thought I would.
> 
> 
> 
> 
> On Wed, 16 Feb 2005 11:08:20 -0500, Michael Stassen  
> <Michael.Stassen@stripped> wrote:
> 
>>
>> AM Thomas wrote:
>>
>>> Guarded exclamations of success!
>>>  This seems to be working for me so far - thank you!  Here's an 
>>> actual   example of a working query on my database; the field names 
>>> are a  little  different (sorry, I was trying to make my earlier 
>>> example more  readable).   The main change, though, is that I did 
>>> plain 'join'  instead of 'left  join', which seems to make it much 
>>> faster.  It was  pretty slow at first.
>>
>>
>> Yes, LEFT JOIN does extra work, and it wasn't needed here.
>>
>>> mysql> select r.id, ga.Grade, ga.GoalNumber, gb.GoalNumber from  
>>> resources  as r
>>>        join resource_goals as ga on r.id=ga.ResourceID
>>>        and ga.Grade='4th' and ga.GoalNumber='1'
>>>        join resource_goals as gb on r.id=gb.ResourceID
>>>        and gb.Grade='4th' and gb.GoalNumber='2'
>>>        where ga.goal_id IS NOT NULL
>>>        and gb.goal_id IS NOT NULL
>>>        group by r.id;
>>
>>
>> There are some strange things here, I think.
>>
>> * You've never shown us your table definitions, but I would have  
>> expected GoalNumber to be an integer, not a string.  If so, you  
>> shouldn't quote the numbers you compare it to.
>>
>> * I think it unlikely that you have rows with values in Grade,  
>> GoalNumber, and ResourceID which have NULL for goal_id.  Isn't 
>> goal_id  the primary key?   If I'm right, you don't need your WHERE 
>> clause.
>>
>> * Why have you added "GROUP BY r.id"?  Was that an attempt to fix  
>> something?   If we've got the query right, there should be no need 
>> for  grouping.  If you don't get the result you want without the GROUP 
>> BY,  then you should let us know, because that would mean we've 
>> missed  something.
>>
>> Putting those together, I'd expect
>>
>>    SELECT r.id, ga.Grade, ga.GoalNumber, gb.GoalNumber
>>    FROM resources AS r
>>    JOIN resource_goals AS ga
>>       ON r.id = ga.ResourceID
>>      AND ga.Grade = '4th'
>>      AND ga.GoalNumber = 1
>>    JOIN resource_goals AS gb
>>       ON r.id = gb.ResourceID
>>      AND gb.Grade = '4th'
>>      AND gb.GoalNumber = 2
>>
>> to do the job.  Does it?
>>
>> Michael
>>
>>
> 
> 
> 


Thread
select where multiple joined records matchAM Thomas13 Feb
  • Re: select where multiple joined records matchPeter Brawley13 Feb
    • Re: select where multiple joined records matchMichael Stassen13 Feb
  • Re: select where multiple joined records matchMichael Stassen13 Feb
    • more complexity (was: select where multiple joined records match)AM Thomas14 Feb
  • RE: select where multiple joined records matchGordon14 Feb
    • Re: select where multiple joined records matchAM Thomas14 Feb
      • Re: select where multiple joined records matchJeremy Cole14 Feb
        • Re: select where multiple joined records matchAM Thomas15 Feb
          • Re: select where multiple joined records matchMichael Stassen16 Feb
            • Re: select where multiple joined records matchAM Thomas16 Feb
              • Re: select where multiple joined records matchMathew Ray17 Feb
              • Re: select where multiple joined records matchMichael Stassen21 Feb
                • Re: select where multiple joined records matchAM Thomas21 Feb
        • Re: select where multiple joined records matchMichael Stassen16 Feb
Re: more complexity (was: select where multiple joined records match)Harald Fuchs14 Feb