List:General Discussion« Previous MessageNext Message »
From:Brandon Phelps Date:September 8 2011 6:27pm
Subject:Re: Query Optimization
View as plain text  
Thanks for the reply Andy.  Unfortunately the users will be selecting varying date ranges
and new data is constantly coming in, so I am not sure how I could archive/cache the
necessary data that would be any more efficient than simply using the database directly.


On 09/08/2011 02:16 PM, Andrew Moore wrote:
> Thinking outside the query, is there any archiving that could happen to make
> your large tables kinder in the range scan?
>
> Andy
>
> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps<bphelps@stripped>  wrote:
>
>> On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>>
>>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>>
>>>> On 9/1/2011 09:42, Brandon Phelps wrote:
>>>>
>>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>>>>> ...
>>>>>
>>>>>>> WHERE
>>>>>>> (open_dt>= '2011-08-30 00:00:00' OR close_dt>=
> '2011-08-30
>>>>> 00:00:00')
>>>>>>> AND (open_dt<= '2011-08-30 12:36:53' OR close_dt<=
> '2011-08-30
>>>>> 12:36:53')
>>>>>
>>>>>> In that case your logic here simplifies to:
>>>>>> WHERE
>>>>>> open_dt>= '2011-08-30 00:00:00'
>>>>>> AND
>>>>>> close_dt<= '2011-08-30 12:36:53'
>>>>>
>>>>>> Now add an index over open_dt and close_dt and see what happens.
>>>>>
>>>>>> Jochem
>>>>>
>>>>> Jochem,
>>>>>
>>>>> I can't really use your WHERE logic because I also need to retrieve
>>>>> results where the open_dt time is out of the range specified. For
>>>>> example, a very large file download might span multiple days so
> given
>>>>> your logic if the connection was started 2 days ago and I want to
> pull 1
>>>>> days worth of connections, I would miss that entry. Basically I want
> to
>>>>> SELECT all of the records that were opened OR closed during the
>>>>> specified time period, ie. if any activity happened between my start
> and
>>>>> end dates, I need to see that record.
>>>>>
>>>>> Any other ideas?
>>>>>
>>>>>
>>>> I believe Jochem was on the right track but he got his dates reversed.
>>>>
>>>> Let's try a little ASCII art to show the situation. I will setup a query
>>>> window with two markers (s) and (e). Events will be marked by |----|
> markers
>>>> showing their durations.
>>>>
>>>> a) (s) (e)
>>>> b) |---|
>>>> c) |---|
>>>> d) |---|
>>>> e) |--------------------|
>>>> f) |---|
>>>> g) |---|
>>>>
>>>> To describe these situations:
>>>> a) is the window for which you want to query (s) is the starting time
> and
>>>> (e) is the ending time for the date range you are interested in.
>>>> b) the event starts and stops before your window exists. It won't be
> part
>>>> of your results.
>>>> c) the event starts before the window but ends within the window -
>>>> include this
>>>> d) the event starts and ends within the window - include this
>>>> e) the event starts before the window and ends after the window -
> include
>>>> this
>>>> f) the event starts inside the window but ends beyond the window -
>>>> include this.
>>>> g) the event starts and ends beyond the window - exclude this.
>>>>
>>>> In order to get every event in the range of c-f, here is what you need
>>>> for a WHERE clause
>>>>
>>>> WHERE start<= (ending time) and end>= (starting time)
>>>>
>>>> Try that and let us know the results.
>>>>
>>>
>>> Thanks Jochem and Shawn, however the following two queries result in the
>>> exact same EXPLAIN output: (I hope the tables don't wrap too early for you)
>>>
>>> Old method:
>>> SELECT
>>> sc.open_dt,
>>> sc.close_dt,
>>> sc.protocol,
>>> INET_NTOA(sc.src_address) AS src_address,
>>> sc.src_port,
>>> INET_NTOA(sc.dst_address) AS dst_address,
>>> sc.dst_port,
>>> sc.sent,
>>> sc.rcvd,
>>> spm.desc AS src_port_desc,
>>> dpm.desc AS dst_port_desc
>>> FROM firewall_connections AS sc
>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>> WHERE
>>> (open_dt>= '2011-08-31 09:53:31' OR close_dt>= '2011-08-31 09:53:31')
>>> AND (open_dt<= '2011-09-01 09:53:31' OR close_dt<= '2011-09-01
>>> 09:53:31')
>>> ORDER BY rcvd DESC
>>> LIMIT 0, 10;
>>>
>>> New method with BTREE index on open_dt, close_dt (index name is
>>> ndx_open_close_dt):
>>> SELECT
>>> sc.open_dt,
>>> sc.close_dt,
>>> sc.protocol,
>>> INET_NTOA(sc.src_address) AS src_address,
>>> sc.src_port,
>>> INET_NTOA(sc.dst_address) AS dst_address,
>>> sc.dst_port,
>>> sc.sent,
>>> sc.rcvd,
>>> spm.desc AS src_port_desc,
>>> dpm.desc AS dst_port_desc
>>> FROM firewall_connections AS sc
>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>> WHERE
>>> open_dt<= '2011-09-01 09:53:31' AND close_dt>= '2011-08-31 09:53:31'
>>> ORDER BY rcvd DESC
>>> LIMIT 0, 10;
>>>
>>> EXPLAIN output for old method:
>>> +----+-------------+-------+--**------+-----------------------**
>>> ----+----------+---------+----**----------------+------+------**-------+
>>> | id | select_type | table | type | possible_keys | key | key_len | ref |
>>> rows | Extra |
>>> +----+-------------+-------+--**------+-----------------------**
>>> ----+----------+---------+----**----------------+------+------**-------+
>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>> NULL | 10 | Using where |
>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
>>> 1 | |
>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
>>> 1 | |
>>> +----+-------------+-------+--**------+-----------------------**
>>> ----+----------+---------+----**----------------+------+------**-------+
>>>
>>> EXPLAIN output for new method with new index:
>>> +----+-------------+-------+--**------+-----------------------**
>>> ----+----------+---------+----**----------------+------+------**-------+
>>> | id | select_type | table | type | possible_keys | key | key_len | ref |
>>> rows | Extra |
>>> +----+-------------+-------+--**------+-----------------------**
>>> ----+----------+---------+----**----------------+------+------**-------+
>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>> NULL | 10 | Using where |
>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
>>> 1 | |
>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
>>> 1 | |
>>> +----+-------------+-------+--**------+-----------------------**
>>> ----+----------+---------+----**----------------+------+------**-------+
>>>
>>> SHOW INDEX:
>>> +----------------------+------**------+-------------------+---**
>>> -----------+-------------+----**-------+-------------+--------**
>>> --+--------+------+-----------**-+---------+
>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation |
>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>>> +----------------------+------**------+-------------------+---**
>>> -----------+-------------+----**-------+-------------+--------**
>>> --+--------+------+-----------**-+---------+
>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A | 1342691
>>> | NULL | NULL | | BTREE | |
>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A |
>>> 6377783 | NULL | NULL | | BTREE | |
>>> +----------------------+------**------+-------------------+---**
>>> -----------+-------------+----**-------+-------------+--------**
>>> --+--------+------+-----------**-+---------+
>>>
>>>
>>> Although right now the queries do seem to be executing much faster,
>>> although I'm not quite sure why. And I'm not sure why the new
>>> ndx_open_close_dt isn't being used either.
>>>
>>> -Brandon
>>>
>>>
>> I am still having a big issue with my query as seen above.  The table is up
>> to around 32 million records at the moment and either of the two SELECT
>> queries above take a very long time to run.  Is there anything at all I can
>> do to speed things up?  It seems that changing the format of the WHERE
>> clause did not help at all, as the EXPLAIN output is exactly the same for
>> both version.  I also tried adding an index on (open_dt, close_dt, rcvd) but
>> that index does not get used.
>>
>> Any other ideas?
>>
>> Thanks in advance,
>>
>>
>> Brandon
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe:    http://lists.mysql.com/mysql?**
>> unsub=eroomydna@stripped<http://lists.mysql.com/mysql?unsub=1
>>
>>
>
Thread
Query OptimizationBrandon Phelps30 Aug
  • Re: Query OptimizationMichael Dykman30 Aug
  • Re: Query OptimizationJochem van Dieten1 Sep
    • Re: Query OptimizationBrandon Phelps1 Sep
      • Re: Query OptimizationMySQL)1 Sep
        • Re: Query OptimizationBrandon Phelps1 Sep
          • Re: Query OptimizationBrandon Phelps8 Sep
            • Re: Query OptimizationAndrew Moore8 Sep
              • Re: Query OptimizationBrandon Phelps8 Sep
                • Re: Query OptimizationAndrew Moore8 Sep
                  • Re: Query OptimizationBrandon Phelps8 Sep
                    • Re: Query OptimizationDerek Downey8 Sep
                      • Re: Query OptimizationBrandon Phelps8 Sep
                        • Re: Query OptimizationMihail Manolov8 Sep
                          • Re: Query OptimizationBrandon Phelps8 Sep
                            • Re: Query OptimizationMihail Manolov8 Sep
                              • Re: Query OptimizationBrandon Phelps8 Sep
                • Re: Query OptimizationMihail Manolov8 Sep
                  • Re: Query OptimizationBrandon Phelps8 Sep
                    • Re: Query OptimizationAndrew Moore8 Sep
                      • Re: Query OptimizationBrandon Phelps8 Sep