List:General Discussion« Previous MessageNext Message »
From:Derek Downey Date:September 8 2011 6:55pm
Subject:Re: Query Optimization
View as plain text  
Correct me if I'm wrong. You're wanting to get all records that have an open_date or a
close_date between two times.

If that's correct, you might be able to get an index_merge by doing a query like:

WHERE ((starting time)<=open_dt<= (ending time)) OR ((starting
time)<=close_dt<=(ending time))

and creating two indexes (one on 'open_dt' and the other on 'close_dt')

http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html

Regards,
Derek

On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:

> Andy,
> 
> The queries take minutes to run.  MySQL is 5.1.54 and it's running on Ubuntu server
> 11.04.  Unfortunately the machine only has 2GB of RAM but no other major daemons are
> running on the machine.  We are running RAID 1 (mirroring) with 1TB drives.  The tables in
> question here are all MyISAM.  When running with the LIMIT 10 my EXPLAIN is:
> 
>
> +----+-------------+-------+--------+---------------------+---------------------+---------+--------------------+----------+-----------------------------+
> | id | select_type | table | type   | possible_keys       | key                 |
> key_len | ref                | rows     | Extra                       |
>
> +----+-------------+-------+--------+---------------------+---------------------+---------+--------------------+----------+-----------------------------+
> |  1 | SIMPLE      | sc    | range  | ndx_open_close_rcvd | ndx_open_close_rcvd | 8  
>     | NULL               | 32393316 | Using where; Using filesort |
> |  1 | SIMPLE      | spm   | eq_ref | PRIMARY             | PRIMARY             | 2  
>     | syslog.sc.src_port |        1 |                             |
> |  1 | SIMPLE      | dpm   | eq_ref | PRIMARY             | PRIMARY             | 2  
>     | syslog.sc.dst_port |        1 |                             |
>
> +----+-------------+-------+--------+---------------------+---------------------+---------+--------------------+----------+-----------------------------+
> 
> When I remove the LIMIT 10 I get:
> 
>
> ----+-------------+-------+--------+---------------------+---------+---------+--------------------+----------+-----------------------------+
> | id | select_type | table | type   | possible_keys       | key     | key_len | ref  
>              | rows     | Extra                       |
>
> +----+-------------+-------+--------+---------------------+---------+---------+--------------------+----------+-----------------------------+
> |  1 | SIMPLE      | sc    | ALL    | ndx_open_close_rcvd | NULL    | NULL    | NULL 
>              | 32393330 | Using where; Using filesort |
> |  1 | SIMPLE      | spm   | eq_ref | PRIMARY             | PRIMARY | 2       |
> syslog.sc.src_port |        1 |                             |
> |  1 | SIMPLE      | dpm   | eq_ref | PRIMARY             | PRIMARY | 2       |
> syslog.sc.dst_port |        1 |                             |
>
> +----+-------------+-------+--------+---------------------+---------+---------+--------------------+----------+-----------------------------+
> 
> Thanks for all your help thus far.
> 
> On 09/08/2011 02:38 PM, Andrew Moore wrote:
>> I don't think I saw any query timings in the emails (maybe I missed them).
>> 
>> What version of MySQL are you currently using?
>> What does the explain look like when your remove the limit 10?
>> Is your server tuned for MyISAM or InnoDB?
>> What kind of disk setup is in use?
>> How much memory is in your machine?
>> 
>> 
>> On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelps<bphelps@stripped>  wrote:
>> 
>>> Thanks for the reply Andy.  Unfortunately the users will be selecting
>>> varying date ranges and new data is constantly coming in, so I am not sure
>>> how I could archive/cache the necessary data that would be any more
>>> efficient than simply using the database directly.
>>> 
>>> 
>>> 
>>> On 09/08/2011 02:16 PM, Andrew Moore wrote:
>>> 
>>>> Thinking outside the query, is there any archiving that could happen to
>>>> make
>>>> your large tables kinder in the range scan?
>>>> 
>>>> Andy
>>>> 
>>>> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps<bphelps@stripped>  
> wrote:
>>>> 
>>>>  On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>>>>> 
>>>>>  On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>>>>> 
>>>>>>  On 9/1/2011 09:42, Brandon Phelps wrote:
>>>>>>> 
>>>>>>>  On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>>>>>> 
>>>>>>>>> ...
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>  WHERE
>>>>>>>>>> (open_dt>= '2011-08-30 00:00:00' OR
> close_dt>= '2011-08-30
>>>>>>>>>> 
>>>>>>>>> 00:00:00')
>>>>>>>> 
>>>>>>>>> AND (open_dt<= '2011-08-30 12:36:53' OR
> close_dt<= '2011-08-30
>>>>>>>>>> 
>>>>>>>>> 12:36:53')
>>>>>>>> 
>>>>>>>>  In that case your logic here simplifies to:
>>>>>>>>> WHERE
>>>>>>>>> open_dt>= '2011-08-30 00:00:00'
>>>>>>>>> AND
>>>>>>>>> close_dt<= '2011-08-30 12:36:53'
>>>>>>>>> 
>>>>>>>> 
>>>>>>>>  Now add an index over open_dt and close_dt and see what
> happens.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>>  Jochem
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Jochem,
>>>>>>>> 
>>>>>>>> I can't really use your WHERE logic because I also need
> to retrieve
>>>>>>>> results where the open_dt time is out of the range
> specified. For
>>>>>>>> example, a very large file download might span multiple
> days so given
>>>>>>>> your logic if the connection was started 2 days ago and I
> want to pull
>>>>>>>> 1
>>>>>>>> days worth of connections, I would miss that entry.
> Basically I want
>>>>>>>> to
>>>>>>>> SELECT all of the records that were opened OR closed
> during the
>>>>>>>> specified time period, ie. if any activity happened
> between my start
>>>>>>>> and
>>>>>>>> end dates, I need to see that record.
>>>>>>>> 
>>>>>>>> Any other ideas?
>>>>>>>> 
>>>>>>>> 
>>>>>>>>  I believe Jochem was on the right track but he got his
> dates
>>>>>>> reversed.
>>>>>>> 
>>>>>>> Let's try a little ASCII art to show the situation. I will
> setup a
>>>>>>> query
>>>>>>> window with two markers (s) and (e). Events will be marked by
> |----|
>>>>>>> markers
>>>>>>> showing their durations.
>>>>>>> 
>>>>>>> a) (s) (e)
>>>>>>> b) |---|
>>>>>>> c) |---|
>>>>>>> d) |---|
>>>>>>> e) |--------------------|
>>>>>>> f) |---|
>>>>>>> g) |---|
>>>>>>> 
>>>>>>> To describe these situations:
>>>>>>> a) is the window for which you want to query (s) is the
> starting time
>>>>>>> and
>>>>>>> (e) is the ending time for the date range you are interested
> in.
>>>>>>> b) the event starts and stops before your window exists. It
> won't be
>>>>>>> part
>>>>>>> of your results.
>>>>>>> c) the event starts before the window but ends within the
> window -
>>>>>>> include this
>>>>>>> d) the event starts and ends within the window - include
> this
>>>>>>> e) the event starts before the window and ends after the
> window -
>>>>>>> include
>>>>>>> this
>>>>>>> f) the event starts inside the window but ends beyond the
> window -
>>>>>>> include this.
>>>>>>> g) the event starts and ends beyond the window - exclude
> this.
>>>>>>> 
>>>>>>> In order to get every event in the range of c-f, here is what
> you need
>>>>>>> for a WHERE clause
>>>>>>> 
>>>>>>> WHERE start<= (ending time) and end>= (starting time)
>>>>>>> 
>>>>>>> Try that and let us know the results.
>>>>>>> 
>>>>>>> 
>>>>>> Thanks Jochem and Shawn, however the following two queries result
> in the
>>>>>> exact same EXPLAIN output: (I hope the tables don't wrap too
> early for
>>>>>> you)
>>>>>> 
>>>>>> Old method:
>>>>>> SELECT
>>>>>> sc.open_dt,
>>>>>> sc.close_dt,
>>>>>> sc.protocol,
>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>> sc.src_port,
>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>> sc.dst_port,
>>>>>> sc.sent,
>>>>>> sc.rcvd,
>>>>>> spm.desc AS src_port_desc,
>>>>>> dpm.desc AS dst_port_desc
>>>>>> FROM firewall_connections AS sc
>>>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>>>> WHERE
>>>>>> (open_dt>= '2011-08-31 09:53:31' OR close_dt>= '2011-08-31
> 09:53:31')
>>>>>> AND (open_dt<= '2011-09-01 09:53:31' OR close_dt<=
> '2011-09-01
>>>>>> 09:53:31')
>>>>>> ORDER BY rcvd DESC
>>>>>> LIMIT 0, 10;
>>>>>> 
>>>>>> New method with BTREE index on open_dt, close_dt (index name is
>>>>>> ndx_open_close_dt):
>>>>>> SELECT
>>>>>> sc.open_dt,
>>>>>> sc.close_dt,
>>>>>> sc.protocol,
>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>> sc.src_port,
>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>> sc.dst_port,
>>>>>> sc.sent,
>>>>>> sc.rcvd,
>>>>>> spm.desc AS src_port_desc,
>>>>>> dpm.desc AS dst_port_desc
>>>>>> FROM firewall_connections AS sc
>>>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>>>> WHERE
>>>>>> open_dt<= '2011-09-01 09:53:31' AND close_dt>= '2011-08-31
> 09:53:31'
>>>>>> ORDER BY rcvd DESC
>>>>>> LIMIT 0, 10;
>>>>>> 
>>>>>> EXPLAIN output for old method:
>>>>>>
> +----+-------------+-------+--****------+---------------------**--**
>>>>>> ----+----------+---------+----****----------------+------+----**
>>>>>> --**-------+
>>>>>> | id | select_type | table | type | possible_keys | key | key_len
> | ref
>>>>>> |
>>>>>> rows | Extra |
>>>>>>
> +----+-------------+-------+--****------+---------------------**--**
>>>>>> ----+----------+---------+----****----------------+------+----**
>>>>>> --**-------+
>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd
> | 4 |
>>>>>> NULL | 10 | Using where |
>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 |
> syslog.sc.src_port
>>>>>> |
>>>>>> 1 | |
>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 |
> syslog.sc.dst_port
>>>>>> |
>>>>>> 1 | |
>>>>>>
> +----+-------------+-------+--****------+---------------------**--**
>>>>>> ----+----------+---------+----****----------------+------+----**
>>>>>> --**-------+
>>>>>> 
>>>>>> EXPLAIN output for new method with new index:
>>>>>>
> +----+-------------+-------+--****------+---------------------**--**
>>>>>> ----+----------+---------+----****----------------+------+----**
>>>>>> --**-------+
>>>>>> | id | select_type | table | type | possible_keys | key | key_len
> | ref
>>>>>> |
>>>>>> rows | Extra |
>>>>>>
> +----+-------------+-------+--****------+---------------------**--**
>>>>>> ----+----------+---------+----****----------------+------+----**
>>>>>> --**-------+
>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd
> | 4 |
>>>>>> NULL | 10 | Using where |
>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 |
> syslog.sc.src_port
>>>>>> |
>>>>>> 1 | |
>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 |
> syslog.sc.dst_port
>>>>>> |
>>>>>> 1 | |
>>>>>>
> +----+-------------+-------+--****------+---------------------**--**
>>>>>> ----+----------+---------+----****----------------+------+----**
>>>>>> --**-------+
>>>>>> 
>>>>>> SHOW INDEX:
>>>>>>
> +----------------------+------****------+-------------------+-**--**
>>>>>>
> -----------+-------------+----****-------+-------------+------**--**
>>>>>> --+--------+------+-----------****-+---------+
>>>>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name |
> Collation
>>>>>> |
>>>>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>>>>>>
> +----------------------+------****------+-------------------+-**--**
>>>>>>
> -----------+-------------+----****-------+-------------+------**--**
>>>>>> --+--------+------+-----------****-+---------+
>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A
> |
>>>>>> 1342691
>>>>>> | NULL | NULL | | BTREE | |
>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A
> |
>>>>>> 6377783 | NULL | NULL | | BTREE | |
>>>>>>
> +----------------------+------****------+-------------------+-**--**
>>>>>>
> -----------+-------------+----****-------+-------------+------**--**
>>>>>> --+--------+------+-----------****-+---------+
>>>>>> 
>>>>>> 
>>>>>> Although right now the queries do seem to be executing much
> faster,
>>>>>> although I'm not quite sure why. And I'm not sure why the new
>>>>>> ndx_open_close_dt isn't being used either.
>>>>>> 
>>>>>> -Brandon
>>>>>> 
>>>>>> 
>>>>>>  I am still having a big issue with my query as seen above.  The
> table
>>>>> is up
>>>>> to around 32 million records at the moment and either of the two
> SELECT
>>>>> queries above take a very long time to run.  Is there anything at all
> I
>>>>> can
>>>>> do to speed things up?  It seems that changing the format of the
> WHERE
>>>>> clause did not help at all, as the EXPLAIN output is exactly the same
> for
>>>>> both version.  I also tried adding an index on (open_dt, close_dt,
> rcvd)
>>>>> but
>>>>> that index does not get used.
>>>>> 
>>>>> Any other ideas?
>>>>> 
>>>>> Thanks in advance,
>>>>> 
>>>>> 
>>>>> Brandon
>>>>> 
>>>>> --
>>>>> MySQL General Mailing List
>>>>> For list archives: http://lists.mysql.com/mysql
>>>>> To unsubscribe:   
> http://lists.mysql.com/mysql?****<http://lists.mysql.com/mysql?**>
>>>>> unsub=eroomydna@stripped<http**://lists.mysql.com/mysql?**
>>>>> unsub=eroomydna@stripped<http://lists.mysql.com/mysql?unsub=1
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> --
>>> MySQL General Mailing List
>>> For list archives: http://lists.mysql.com/mysql
>>> To unsubscribe:    http://lists.mysql.com/mysql?**
>>> unsub=eroomydna@stripped<http://lists.mysql.com/mysql?unsub=1
>>> 
>>> 
>> 
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql?unsub=1
> 

Thread
Query OptimizationBrandon Phelps30 Aug
  • Re: Query OptimizationMichael Dykman30 Aug
  • Re: Query OptimizationJochem van Dieten1 Sep
    • Re: Query OptimizationBrandon Phelps1 Sep
      • Re: Query OptimizationMySQL)1 Sep
        • Re: Query OptimizationBrandon Phelps1 Sep
          • Re: Query OptimizationBrandon Phelps8 Sep
            • Re: Query OptimizationAndrew Moore8 Sep
              • Re: Query OptimizationBrandon Phelps8 Sep
                • Re: Query OptimizationAndrew Moore8 Sep
                  • Re: Query OptimizationBrandon Phelps8 Sep
                    • Re: Query OptimizationDerek Downey8 Sep
                      • Re: Query OptimizationBrandon Phelps8 Sep
                        • Re: Query OptimizationMihail Manolov8 Sep
                          • Re: Query OptimizationBrandon Phelps8 Sep
                            • Re: Query OptimizationMihail Manolov8 Sep
                              • Re: Query OptimizationBrandon Phelps8 Sep
                • Re: Query OptimizationMihail Manolov8 Sep
                  • Re: Query OptimizationBrandon Phelps8 Sep
                    • Re: Query OptimizationAndrew Moore8 Sep
                      • Re: Query OptimizationBrandon Phelps8 Sep