List:General Discussion« Previous MessageNext Message »
From:Mihail Manolov Date:September 8 2011 6:47pm
Subject:Re: Query Optimization
View as plain text  
If you're running version 5.1+ you may wanna take a look at table partitioning options you
may have.

On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:

> Thanks for the reply Andy.  Unfortunately the users will be selecting varying date
> ranges and new data is constantly coming in, so I am not sure how I could archive/cache
> the necessary data that would be any more efficient than simply using the database
> directly.
> 
> 
> On 09/08/2011 02:16 PM, Andrew Moore wrote:
>> Thinking outside the query, is there any archiving that could happen to make
>> your large tables kinder in the range scan?
>> 
>> Andy
>> 
>> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps<bphelps@stripped>  wrote:
>> 
>>> On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>>> 
>>>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>>> 
>>>>> On 9/1/2011 09:42, Brandon Phelps wrote:
>>>>> 
>>>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>>>>>> ...
>>>>>> 
>>>>>>>> WHERE
>>>>>>>> (open_dt>= '2011-08-30 00:00:00' OR close_dt>=
> '2011-08-30
>>>>>> 00:00:00')
>>>>>>>> AND (open_dt<= '2011-08-30 12:36:53' OR close_dt<=
> '2011-08-30
>>>>>> 12:36:53')
>>>>>> 
>>>>>>> In that case your logic here simplifies to:
>>>>>>> WHERE
>>>>>>> open_dt>= '2011-08-30 00:00:00'
>>>>>>> AND
>>>>>>> close_dt<= '2011-08-30 12:36:53'
>>>>>> 
>>>>>>> Now add an index over open_dt and close_dt and see what
> happens.
>>>>>> 
>>>>>>> Jochem
>>>>>> 
>>>>>> Jochem,
>>>>>> 
>>>>>> I can't really use your WHERE logic because I also need to
> retrieve
>>>>>> results where the open_dt time is out of the range specified.
> For
>>>>>> example, a very large file download might span multiple days so
> given
>>>>>> your logic if the connection was started 2 days ago and I want to
> pull 1
>>>>>> days worth of connections, I would miss that entry. Basically I
> want to
>>>>>> SELECT all of the records that were opened OR closed during the
>>>>>> specified time period, ie. if any activity happened between my
> start and
>>>>>> end dates, I need to see that record.
>>>>>> 
>>>>>> Any other ideas?
>>>>>> 
>>>>>> 
>>>>> I believe Jochem was on the right track but he got his dates
> reversed.
>>>>> 
>>>>> Let's try a little ASCII art to show the situation. I will setup a
> query
>>>>> window with two markers (s) and (e). Events will be marked by |----|
> markers
>>>>> showing their durations.
>>>>> 
>>>>> a) (s) (e)
>>>>> b) |---|
>>>>> c) |---|
>>>>> d) |---|
>>>>> e) |--------------------|
>>>>> f) |---|
>>>>> g) |---|
>>>>> 
>>>>> To describe these situations:
>>>>> a) is the window for which you want to query (s) is the starting time
> and
>>>>> (e) is the ending time for the date range you are interested in.
>>>>> b) the event starts and stops before your window exists. It won't be
> part
>>>>> of your results.
>>>>> c) the event starts before the window but ends within the window -
>>>>> include this
>>>>> d) the event starts and ends within the window - include this
>>>>> e) the event starts before the window and ends after the window -
> include
>>>>> this
>>>>> f) the event starts inside the window but ends beyond the window -
>>>>> include this.
>>>>> g) the event starts and ends beyond the window - exclude this.
>>>>> 
>>>>> In order to get every event in the range of c-f, here is what you
> need
>>>>> for a WHERE clause
>>>>> 
>>>>> WHERE start<= (ending time) and end>= (starting time)
>>>>> 
>>>>> Try that and let us know the results.
>>>>> 
>>>> 
>>>> Thanks Jochem and Shawn, however the following two queries result in the
>>>> exact same EXPLAIN output: (I hope the tables don't wrap too early for
> you)
>>>> 
>>>> Old method:
>>>> SELECT
>>>> sc.open_dt,
>>>> sc.close_dt,
>>>> sc.protocol,
>>>> INET_NTOA(sc.src_address) AS src_address,
>>>> sc.src_port,
>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>> sc.dst_port,
>>>> sc.sent,
>>>> sc.rcvd,
>>>> spm.desc AS src_port_desc,
>>>> dpm.desc AS dst_port_desc
>>>> FROM firewall_connections AS sc
>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>> WHERE
>>>> (open_dt>= '2011-08-31 09:53:31' OR close_dt>= '2011-08-31
> 09:53:31')
>>>> AND (open_dt<= '2011-09-01 09:53:31' OR close_dt<= '2011-09-01
>>>> 09:53:31')
>>>> ORDER BY rcvd DESC
>>>> LIMIT 0, 10;
>>>> 
>>>> New method with BTREE index on open_dt, close_dt (index name is
>>>> ndx_open_close_dt):
>>>> SELECT
>>>> sc.open_dt,
>>>> sc.close_dt,
>>>> sc.protocol,
>>>> INET_NTOA(sc.src_address) AS src_address,
>>>> sc.src_port,
>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>> sc.dst_port,
>>>> sc.sent,
>>>> sc.rcvd,
>>>> spm.desc AS src_port_desc,
>>>> dpm.desc AS dst_port_desc
>>>> FROM firewall_connections AS sc
>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>> WHERE
>>>> open_dt<= '2011-09-01 09:53:31' AND close_dt>= '2011-08-31
> 09:53:31'
>>>> ORDER BY rcvd DESC
>>>> LIMIT 0, 10;
>>>> 
>>>> EXPLAIN output for old method:
>>>> +----+-------------+-------+--**------+-----------------------**
>>>> ----+----------+---------+----**----------------+------+------**-------+
>>>> | id | select_type | table | type | possible_keys | key | key_len | ref
> |
>>>> rows | Extra |
>>>> +----+-------------+-------+--**------+-----------------------**
>>>> ----+----------+---------+----**----------------+------+------**-------+
>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>> NULL | 10 | Using where |
>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port
> |
>>>> 1 | |
>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port
> |
>>>> 1 | |
>>>> +----+-------------+-------+--**------+-----------------------**
>>>> ----+----------+---------+----**----------------+------+------**-------+
>>>> 
>>>> EXPLAIN output for new method with new index:
>>>> +----+-------------+-------+--**------+-----------------------**
>>>> ----+----------+---------+----**----------------+------+------**-------+
>>>> | id | select_type | table | type | possible_keys | key | key_len | ref
> |
>>>> rows | Extra |
>>>> +----+-------------+-------+--**------+-----------------------**
>>>> ----+----------+---------+----**----------------+------+------**-------+
>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>> NULL | 10 | Using where |
>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port
> |
>>>> 1 | |
>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port
> |
>>>> 1 | |
>>>> +----+-------------+-------+--**------+-----------------------**
>>>> ----+----------+---------+----**----------------+------+------**-------+
>>>> 
>>>> SHOW INDEX:
>>>> +----------------------+------**------+-------------------+---**
>>>> -----------+-------------+----**-------+-------------+--------**
>>>> --+--------+------+-----------**-+---------+
>>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation
> |
>>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>>>> +----------------------+------**------+-------------------+---**
>>>> -----------+-------------+----**-------+-------------+--------**
>>>> --+--------+------+-----------**-+---------+
>>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A |
> 1342691
>>>> | NULL | NULL | | BTREE | |
>>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A |
>>>> 6377783 | NULL | NULL | | BTREE | |
>>>> +----------------------+------**------+-------------------+---**
>>>> -----------+-------------+----**-------+-------------+--------**
>>>> --+--------+------+-----------**-+---------+
>>>> 
>>>> 
>>>> Although right now the queries do seem to be executing much faster,
>>>> although I'm not quite sure why. And I'm not sure why the new
>>>> ndx_open_close_dt isn't being used either.
>>>> 
>>>> -Brandon
>>>> 
>>>> 
>>> I am still having a big issue with my query as seen above.  The table is up
>>> to around 32 million records at the moment and either of the two SELECT
>>> queries above take a very long time to run.  Is there anything at all I can
>>> do to speed things up?  It seems that changing the format of the WHERE
>>> clause did not help at all, as the EXPLAIN output is exactly the same for
>>> both version.  I also tried adding an index on (open_dt, close_dt, rcvd) but
>>> that index does not get used.
>>> 
>>> Any other ideas?
>>> 
>>> Thanks in advance,
>>> 
>>> 
>>> Brandon
>>> 
>>> --
>>> MySQL General Mailing List
>>> For list archives: http://lists.mysql.com/mysql
>>> To unsubscribe:    http://lists.mysql.com/mysql?**
>>> unsub=eroomydna@stripped<http://lists.mysql.com/mysql?unsub=1
>>> 
>>> 
>> 
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql?unsub=1
> 

Thread
Query OptimizationBrandon Phelps30 Aug
  • Re: Query OptimizationMichael Dykman30 Aug
  • Re: Query OptimizationJochem van Dieten1 Sep
    • Re: Query OptimizationBrandon Phelps1 Sep
      • Re: Query OptimizationMySQL)1 Sep
        • Re: Query OptimizationBrandon Phelps1 Sep
          • Re: Query OptimizationBrandon Phelps8 Sep
            • Re: Query OptimizationAndrew Moore8 Sep
              • Re: Query OptimizationBrandon Phelps8 Sep
                • Re: Query OptimizationAndrew Moore8 Sep
                  • Re: Query OptimizationBrandon Phelps8 Sep
                    • Re: Query OptimizationDerek Downey8 Sep
                      • Re: Query OptimizationBrandon Phelps8 Sep
                        • Re: Query OptimizationMihail Manolov8 Sep
                          • Re: Query OptimizationBrandon Phelps8 Sep
                            • Re: Query OptimizationMihail Manolov8 Sep
                              • Re: Query OptimizationBrandon Phelps8 Sep
                • Re: Query OptimizationMihail Manolov8 Sep
                  • Re: Query OptimizationBrandon Phelps8 Sep
                    • Re: Query OptimizationAndrew Moore8 Sep
                      • Re: Query OptimizationBrandon Phelps8 Sep