If you're running version 5.1+ you may wanna take a look at table partitioning options you
may have.
On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:
> Thanks for the reply Andy. Unfortunately the users will be selecting varying date
> ranges and new data is constantly coming in, so I am not sure how I could archive/cache
> the necessary data that would be any more efficient than simply using the database
> directly.
>
>
> On 09/08/2011 02:16 PM, Andrew Moore wrote:
>> Thinking outside the query, is there any archiving that could happen to make
>> your large tables kinder in the range scan?
>>
>> Andy
>>
>> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps<bphelps@stripped> wrote:
>>
>>> On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>>>
>>>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>>>
>>>>> On 9/1/2011 09:42, Brandon Phelps wrote:
>>>>>
>>>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>>>>>> ...
>>>>>>
>>>>>>>> WHERE
>>>>>>>> (open_dt>= '2011-08-30 00:00:00' OR close_dt>=
> '2011-08-30
>>>>>> 00:00:00')
>>>>>>>> AND (open_dt<= '2011-08-30 12:36:53' OR close_dt<=
> '2011-08-30
>>>>>> 12:36:53')
>>>>>>
>>>>>>> In that case your logic here simplifies to:
>>>>>>> WHERE
>>>>>>> open_dt>= '2011-08-30 00:00:00'
>>>>>>> AND
>>>>>>> close_dt<= '2011-08-30 12:36:53'
>>>>>>
>>>>>>> Now add an index over open_dt and close_dt and see what
> happens.
>>>>>>
>>>>>>> Jochem
>>>>>>
>>>>>> Jochem,
>>>>>>
>>>>>> I can't really use your WHERE logic because I also need to
> retrieve
>>>>>> results where the open_dt time is out of the range specified.
> For
>>>>>> example, a very large file download might span multiple days so
> given
>>>>>> your logic if the connection was started 2 days ago and I want to
> pull 1
>>>>>> days worth of connections, I would miss that entry. Basically I
> want to
>>>>>> SELECT all of the records that were opened OR closed during the
>>>>>> specified time period, ie. if any activity happened between my
> start and
>>>>>> end dates, I need to see that record.
>>>>>>
>>>>>> Any other ideas?
>>>>>>
>>>>>>
>>>>> I believe Jochem was on the right track but he got his dates
> reversed.
>>>>>
>>>>> Let's try a little ASCII art to show the situation. I will setup a
> query
>>>>> window with two markers (s) and (e). Events will be marked by |----|
> markers
>>>>> showing their durations.
>>>>>
>>>>> a) (s) (e)
>>>>> b) |---|
>>>>> c) |---|
>>>>> d) |---|
>>>>> e) |--------------------|
>>>>> f) |---|
>>>>> g) |---|
>>>>>
>>>>> To describe these situations:
>>>>> a) is the window for which you want to query (s) is the starting time
> and
>>>>> (e) is the ending time for the date range you are interested in.
>>>>> b) the event starts and stops before your window exists. It won't be
> part
>>>>> of your results.
>>>>> c) the event starts before the window but ends within the window -
>>>>> include this
>>>>> d) the event starts and ends within the window - include this
>>>>> e) the event starts before the window and ends after the window -
> include
>>>>> this
>>>>> f) the event starts inside the window but ends beyond the window -
>>>>> include this.
>>>>> g) the event starts and ends beyond the window - exclude this.
>>>>>
>>>>> In order to get every event in the range of c-f, here is what you
> need
>>>>> for a WHERE clause
>>>>>
>>>>> WHERE start<= (ending time) and end>= (starting time)
>>>>>
>>>>> Try that and let us know the results.
>>>>>
>>>>
>>>> Thanks Jochem and Shawn, however the following two queries result in the
>>>> exact same EXPLAIN output: (I hope the tables don't wrap too early for
> you)
>>>>
>>>> Old method:
>>>> SELECT
>>>> sc.open_dt,
>>>> sc.close_dt,
>>>> sc.protocol,
>>>> INET_NTOA(sc.src_address) AS src_address,
>>>> sc.src_port,
>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>> sc.dst_port,
>>>> sc.sent,
>>>> sc.rcvd,
>>>> spm.desc AS src_port_desc,
>>>> dpm.desc AS dst_port_desc
>>>> FROM firewall_connections AS sc
>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>> WHERE
>>>> (open_dt>= '2011-08-31 09:53:31' OR close_dt>= '2011-08-31
> 09:53:31')
>>>> AND (open_dt<= '2011-09-01 09:53:31' OR close_dt<= '2011-09-01
>>>> 09:53:31')
>>>> ORDER BY rcvd DESC
>>>> LIMIT 0, 10;
>>>>
>>>> New method with BTREE index on open_dt, close_dt (index name is
>>>> ndx_open_close_dt):
>>>> SELECT
>>>> sc.open_dt,
>>>> sc.close_dt,
>>>> sc.protocol,
>>>> INET_NTOA(sc.src_address) AS src_address,
>>>> sc.src_port,
>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>> sc.dst_port,
>>>> sc.sent,
>>>> sc.rcvd,
>>>> spm.desc AS src_port_desc,
>>>> dpm.desc AS dst_port_desc
>>>> FROM firewall_connections AS sc
>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>> WHERE
>>>> open_dt<= '2011-09-01 09:53:31' AND close_dt>= '2011-08-31
> 09:53:31'
>>>> ORDER BY rcvd DESC
>>>> LIMIT 0, 10;
>>>>
>>>> EXPLAIN output for old method:
>>>> +----+-------------+-------+--**------+-----------------------**
>>>> ----+----------+---------+----**----------------+------+------**-------+
>>>> | id | select_type | table | type | possible_keys | key | key_len | ref
> |
>>>> rows | Extra |
>>>> +----+-------------+-------+--**------+-----------------------**
>>>> ----+----------+---------+----**----------------+------+------**-------+
>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>> NULL | 10 | Using where |
>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port
> |
>>>> 1 | |
>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port
> |
>>>> 1 | |
>>>> +----+-------------+-------+--**------+-----------------------**
>>>> ----+----------+---------+----**----------------+------+------**-------+
>>>>
>>>> EXPLAIN output for new method with new index:
>>>> +----+-------------+-------+--**------+-----------------------**
>>>> ----+----------+---------+----**----------------+------+------**-------+
>>>> | id | select_type | table | type | possible_keys | key | key_len | ref
> |
>>>> rows | Extra |
>>>> +----+-------------+-------+--**------+-----------------------**
>>>> ----+----------+---------+----**----------------+------+------**-------+
>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
>>>> NULL | 10 | Using where |
>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port
> |
>>>> 1 | |
>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port
> |
>>>> 1 | |
>>>> +----+-------------+-------+--**------+-----------------------**
>>>> ----+----------+---------+----**----------------+------+------**-------+
>>>>
>>>> SHOW INDEX:
>>>> +----------------------+------**------+-------------------+---**
>>>> -----------+-------------+----**-------+-------------+--------**
>>>> --+--------+------+-----------**-+---------+
>>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation
> |
>>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment |
>>>> +----------------------+------**------+-------------------+---**
>>>> -----------+-------------+----**-------+-------------+--------**
>>>> --+--------+------+-----------**-+---------+
>>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A |
> 1342691
>>>> | NULL | NULL | | BTREE | |
>>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A |
>>>> 6377783 | NULL | NULL | | BTREE | |
>>>> +----------------------+------**------+-------------------+---**
>>>> -----------+-------------+----**-------+-------------+--------**
>>>> --+--------+------+-----------**-+---------+
>>>>
>>>>
>>>> Although right now the queries do seem to be executing much faster,
>>>> although I'm not quite sure why. And I'm not sure why the new
>>>> ndx_open_close_dt isn't being used either.
>>>>
>>>> -Brandon
>>>>
>>>>
>>> I am still having a big issue with my query as seen above. The table is up
>>> to around 32 million records at the moment and either of the two SELECT
>>> queries above take a very long time to run. Is there anything at all I can
>>> do to speed things up? It seems that changing the format of the WHERE
>>> clause did not help at all, as the EXPLAIN output is exactly the same for
>>> both version. I also tried adding an index on (open_dt, close_dt, rcvd) but
>>> that index does not get used.
>>>
>>> Any other ideas?
>>>
>>> Thanks in advance,
>>>
>>>
>>> Brandon
>>>
>>> --
>>> MySQL General Mailing List
>>> For list archives: http://lists.mysql.com/mysql
>>> To unsubscribe: http://lists.mysql.com/mysql?**
>>> unsub=eroomydna@stripped<http://lists.mysql.com/mysql?unsub=1
>>>
>>>
>>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=1
>