List:General Discussion« Previous MessageNext Message »
From:Brandon Phelps Date:September 8 2011 7:25pm
Subject:Re: Query Optimization
View as plain text  
Thanks for the idea Derek, however given the following query my EXPLAIN output is
identical:

SELECT
	sc.open_dt,
	sc.close_dt,
	sc.protocol,
	INET_NTOA(sc.src_address) AS src_address,
	sc.src_port,
	INET_NTOA(sc.dst_address) AS dst_address,
	sc.dst_port,
	sc.sent,
	sc.rcvd,
	spm.desc AS src_port_desc,
	dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
	LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
	LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE ('2011-09-07 13:18:58' <= open_dt <= '2011-09-08 13:18:58') OR ('2011-09-07
13:18:58' <= close_dt <= '2011-09-08 13:18:58');


+----+-------------+-------+--------+---------------+---------+---------+--------------------+----------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref             
  | rows     | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+--------------------+----------+-------------+
|  1 | SIMPLE      | sc    | ALL    | NULL          | NULL    | NULL    | NULL            
  | 32393330 | Using where |
|  1 | SIMPLE      | spm   | eq_ref | PRIMARY       | PRIMARY | 2       |
syslog.sc.src_port |        1 |             |
|  1 | SIMPLE      | dpm   | eq_ref | PRIMARY       | PRIMARY | 2       |
syslog.sc.dst_port |        1 |             |
+----+-------------+-------+--------+---------------+---------+---------+--------------------+----------+-------------+

I did create indexes on open_dt and close_dt (2 separate indexes).



On 09/08/2011 02:55 PM, Derek Downey wrote:
> Correct me if I'm wrong. You're wanting to get all records that have an open_date or
> a close_date between two times.
>
> If that's correct, you might be able to get an index_merge by doing a query like:
>
> WHERE ((starting time)<=open_dt<= (ending time)) OR ((starting
> time)<=close_dt<=(ending time))
>
> and creating two indexes (one on 'open_dt' and the other on 'close_dt')
>
> http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html
>
> Regards,
> Derek
>
> On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:
>
>> Andy,
>>
>> The queries take minutes to run.  MySQL is 5.1.54 and it's running on Ubuntu
> server 11.04.  Unfortunately the machine only has 2GB of RAM but no other major daemons
> are running on the machine.  We are running RAID 1 (mirroring) with 1TB drives.  The
> tables in question here are all MyISAM.  When running with the LIMIT 10 my EXPLAIN is:
>>
>>
> +----+-------------+-------+--------+---------------------+---------------------+---------+--------------------+----------+-----------------------------+
>> | id | select_type | table | type   | possible_keys       | key                 |
> key_len | ref                | rows     | Extra                       |
>>
> +----+-------------+-------+--------+---------------------+---------------------+---------+--------------------+----------+-----------------------------+
>> |  1 | SIMPLE      | sc    | range  | ndx_open_close_rcvd | ndx_open_close_rcvd |
> 8       | NULL               | 32393316 | Using where; Using filesort |
>> |  1 | SIMPLE      | spm   | eq_ref | PRIMARY             | PRIMARY             |
> 2       | syslog.sc.src_port |        1 |                             |
>> |  1 | SIMPLE      | dpm   | eq_ref | PRIMARY             | PRIMARY             |
> 2       | syslog.sc.dst_port |        1 |                             |
>>
> +----+-------------+-------+--------+---------------------+---------------------+---------+--------------------+----------+-----------------------------+
>>
>> When I remove the LIMIT 10 I get:
>>
>>
> ----+-------------+-------+--------+---------------------+---------+---------+--------------------+----------+-----------------------------+
>> | id | select_type | table | type   | possible_keys       | key     | key_len |
> ref                | rows     | Extra                       |
>>
> +----+-------------+-------+--------+---------------------+---------+---------+--------------------+----------+-----------------------------+
>> |  1 | SIMPLE      | sc    | ALL    | ndx_open_close_rcvd | NULL    | NULL    |
> NULL               | 32393330 | Using where; Using filesort |
>> |  1 | SIMPLE      | spm   | eq_ref | PRIMARY             | PRIMARY | 2       |
> syslog.sc.src_port |        1 |                             |
>> |  1 | SIMPLE      | dpm   | eq_ref | PRIMARY             | PRIMARY | 2       |
> syslog.sc.dst_port |        1 |                             |
>>
> +----+-------------+-------+--------+---------------------+---------+---------+--------------------+----------+-----------------------------+
>>
>> Thanks for all your help thus far.
>>
>> On 09/08/2011 02:38 PM, Andrew Moore wrote:
>>> I don't think I saw any query timings in the emails (maybe I missed them).
>>>
>>> What version of MySQL are you currently using?
>>> What does the explain look like when your remove the limit 10?
>>> Is your server tuned for MyISAM or InnoDB?
>>> What kind of disk setup is in use?
>>> How much memory is in your machine?
>>>
>>>
>>> On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelps<bphelps@stripped>  
> wrote:
>>>
>>>> Thanks for the reply Andy.  Unfortunately the users will be selecting
>>>> varying date ranges and new data is constantly coming in, so I am not
> sure
>>>> how I could archive/cache the necessary data that would be any more
>>>> efficient than simply using the database directly.
>>>>
>>>>
>>>>
>>>> On 09/08/2011 02:16 PM, Andrew Moore wrote:
>>>>
>>>>> Thinking outside the query, is there any archiving that could happen
> to
>>>>> make
>>>>> your large tables kinder in the range scan?
>>>>>
>>>>> Andy
>>>>>
>>>>> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps<bphelps@stripped>
>    wrote:
>>>>>
>>>>>   On 09/01/2011 01:32 PM, Brandon Phelps wrote:
>>>>>>
>>>>>>   On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
>>>>>>>
>>>>>>>   On 9/1/2011 09:42, Brandon Phelps wrote:
>>>>>>>>
>>>>>>>>   On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
>>>>>>>>>
>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>   WHERE
>>>>>>>>>>> (open_dt>= '2011-08-30 00:00:00' OR
> close_dt>= '2011-08-30
>>>>>>>>>>>
>>>>>>>>>> 00:00:00')
>>>>>>>>>
>>>>>>>>>> AND (open_dt<= '2011-08-30 12:36:53' OR
> close_dt<= '2011-08-30
>>>>>>>>>>>
>>>>>>>>>> 12:36:53')
>>>>>>>>>
>>>>>>>>>   In that case your logic here simplifies to:
>>>>>>>>>> WHERE
>>>>>>>>>> open_dt>= '2011-08-30 00:00:00'
>>>>>>>>>> AND
>>>>>>>>>> close_dt<= '2011-08-30 12:36:53'
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   Now add an index over open_dt and close_dt and see
> what happens.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   Jochem
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Jochem,
>>>>>>>>>
>>>>>>>>> I can't really use your WHERE logic because I also
> need to retrieve
>>>>>>>>> results where the open_dt time is out of the range
> specified. For
>>>>>>>>> example, a very large file download might span
> multiple days so given
>>>>>>>>> your logic if the connection was started 2 days ago
> and I want to pull
>>>>>>>>> 1
>>>>>>>>> days worth of connections, I would miss that entry.
> Basically I want
>>>>>>>>> to
>>>>>>>>> SELECT all of the records that were opened OR closed
> during the
>>>>>>>>> specified time period, ie. if any activity happened
> between my start
>>>>>>>>> and
>>>>>>>>> end dates, I need to see that record.
>>>>>>>>>
>>>>>>>>> Any other ideas?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   I believe Jochem was on the right track but he got
> his dates
>>>>>>>> reversed.
>>>>>>>>
>>>>>>>> Let's try a little ASCII art to show the situation. I
> will setup a
>>>>>>>> query
>>>>>>>> window with two markers (s) and (e). Events will be
> marked by |----|
>>>>>>>> markers
>>>>>>>> showing their durations.
>>>>>>>>
>>>>>>>> a) (s) (e)
>>>>>>>> b) |---|
>>>>>>>> c) |---|
>>>>>>>> d) |---|
>>>>>>>> e) |--------------------|
>>>>>>>> f) |---|
>>>>>>>> g) |---|
>>>>>>>>
>>>>>>>> To describe these situations:
>>>>>>>> a) is the window for which you want to query (s) is the
> starting time
>>>>>>>> and
>>>>>>>> (e) is the ending time for the date range you are
> interested in.
>>>>>>>> b) the event starts and stops before your window exists.
> It won't be
>>>>>>>> part
>>>>>>>> of your results.
>>>>>>>> c) the event starts before the window but ends within the
> window -
>>>>>>>> include this
>>>>>>>> d) the event starts and ends within the window - include
> this
>>>>>>>> e) the event starts before the window and ends after the
> window -
>>>>>>>> include
>>>>>>>> this
>>>>>>>> f) the event starts inside the window but ends beyond the
> window -
>>>>>>>> include this.
>>>>>>>> g) the event starts and ends beyond the window - exclude
> this.
>>>>>>>>
>>>>>>>> In order to get every event in the range of c-f, here is
> what you need
>>>>>>>> for a WHERE clause
>>>>>>>>
>>>>>>>> WHERE start<= (ending time) and end>= (starting
> time)
>>>>>>>>
>>>>>>>> Try that and let us know the results.
>>>>>>>>
>>>>>>>>
>>>>>>> Thanks Jochem and Shawn, however the following two queries
> result in the
>>>>>>> exact same EXPLAIN output: (I hope the tables don't wrap too
> early for
>>>>>>> you)
>>>>>>>
>>>>>>> Old method:
>>>>>>> SELECT
>>>>>>> sc.open_dt,
>>>>>>> sc.close_dt,
>>>>>>> sc.protocol,
>>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>>> sc.src_port,
>>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>>> sc.dst_port,
>>>>>>> sc.sent,
>>>>>>> sc.rcvd,
>>>>>>> spm.desc AS src_port_desc,
>>>>>>> dpm.desc AS dst_port_desc
>>>>>>> FROM firewall_connections AS sc
>>>>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>>>>> WHERE
>>>>>>> (open_dt>= '2011-08-31 09:53:31' OR close_dt>=
> '2011-08-31 09:53:31')
>>>>>>> AND (open_dt<= '2011-09-01 09:53:31' OR close_dt<=
> '2011-09-01
>>>>>>> 09:53:31')
>>>>>>> ORDER BY rcvd DESC
>>>>>>> LIMIT 0, 10;
>>>>>>>
>>>>>>> New method with BTREE index on open_dt, close_dt (index name
> is
>>>>>>> ndx_open_close_dt):
>>>>>>> SELECT
>>>>>>> sc.open_dt,
>>>>>>> sc.close_dt,
>>>>>>> sc.protocol,
>>>>>>> INET_NTOA(sc.src_address) AS src_address,
>>>>>>> sc.src_port,
>>>>>>> INET_NTOA(sc.dst_address) AS dst_address,
>>>>>>> sc.dst_port,
>>>>>>> sc.sent,
>>>>>>> sc.rcvd,
>>>>>>> spm.desc AS src_port_desc,
>>>>>>> dpm.desc AS dst_port_desc
>>>>>>> FROM firewall_connections AS sc
>>>>>>> LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
>>>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
>>>>>>> WHERE
>>>>>>> open_dt<= '2011-09-01 09:53:31' AND close_dt>=
> '2011-08-31 09:53:31'
>>>>>>> ORDER BY rcvd DESC
>>>>>>> LIMIT 0, 10;
>>>>>>>
>>>>>>> EXPLAIN output for old method:
>>>>>>>
> +----+-------------+-------+--****------+---------------------**--**
>>>>>>>
> ----+----------+---------+----****----------------+------+----**
>>>>>>> --**-------+
>>>>>>> | id | select_type | table | type | possible_keys | key |
> key_len | ref
>>>>>>> |
>>>>>>> rows | Extra |
>>>>>>>
> +----+-------------+-------+--****------+---------------------**--**
>>>>>>>
> ----+----------+---------+----****----------------+------+----**
>>>>>>> --**-------+
>>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt |
> ndx_rcvd | 4 |
>>>>>>> NULL | 10 | Using where |
>>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 |
> syslog.sc.src_port
>>>>>>> |
>>>>>>> 1 | |
>>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 |
> syslog.sc.dst_port
>>>>>>> |
>>>>>>> 1 | |
>>>>>>>
> +----+-------------+-------+--****------+---------------------**--**
>>>>>>>
> ----+----------+---------+----****----------------+------+----**
>>>>>>> --**-------+
>>>>>>>
>>>>>>> EXPLAIN output for new method with new index:
>>>>>>>
> +----+-------------+-------+--****------+---------------------**--**
>>>>>>>
> ----+----------+---------+----****----------------+------+----**
>>>>>>> --**-------+
>>>>>>> | id | select_type | table | type | possible_keys | key |
> key_len | ref
>>>>>>> |
>>>>>>> rows | Extra |
>>>>>>>
> +----+-------------+-------+--****------+---------------------**--**
>>>>>>>
> ----+----------+---------+----****----------------+------+----**
>>>>>>> --**-------+
>>>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt |
> ndx_rcvd | 4 |
>>>>>>> NULL | 10 | Using where |
>>>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 |
> syslog.sc.src_port
>>>>>>> |
>>>>>>> 1 | |
>>>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 |
> syslog.sc.dst_port
>>>>>>> |
>>>>>>> 1 | |
>>>>>>>
> +----+-------------+-------+--****------+---------------------**--**
>>>>>>>
> ----+----------+---------+----****----------------+------+----**
>>>>>>> --**-------+
>>>>>>>
>>>>>>> SHOW INDEX:
>>>>>>>
> +----------------------+------****------+-------------------+-**--**
>>>>>>>
> -----------+-------------+----****-------+-------------+------**--**
>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name
> | Collation
>>>>>>> |
>>>>>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment
> |
>>>>>>>
> +----------------------+------****------+-------------------+-**--**
>>>>>>>
> -----------+-------------+----****-------+-------------+------**--**
>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt
> | A |
>>>>>>> 1342691
>>>>>>> | NULL | NULL | | BTREE | |
>>>>>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt
> | A |
>>>>>>> 6377783 | NULL | NULL | | BTREE | |
>>>>>>>
> +----------------------+------****------+-------------------+-**--**
>>>>>>>
> -----------+-------------+----****-------+-------------+------**--**
>>>>>>> --+--------+------+-----------****-+---------+
>>>>>>>
>>>>>>>
>>>>>>> Although right now the queries do seem to be executing much
> faster,
>>>>>>> although I'm not quite sure why. And I'm not sure why the
> new
>>>>>>> ndx_open_close_dt isn't being used either.
>>>>>>>
>>>>>>> -Brandon
>>>>>>>
>>>>>>>
>>>>>>>   I am still having a big issue with my query as seen above. 
> The table
>>>>>> is up
>>>>>> to around 32 million records at the moment and either of the two
> SELECT
>>>>>> queries above take a very long time to run.  Is there anything at
> all I
>>>>>> can
>>>>>> do to speed things up?  It seems that changing the format of the
> WHERE
>>>>>> clause did not help at all, as the EXPLAIN output is exactly the
> same for
>>>>>> both version.  I also tried adding an index on (open_dt,
> close_dt, rcvd)
>>>>>> but
>>>>>> that index does not get used.
>>>>>>
>>>>>> Any other ideas?
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>>
>>>>>> Brandon
>>>>>>
>>>>>> --
>>>>>> MySQL General Mailing List
>>>>>> For list archives: http://lists.mysql.com/mysql
>>>>>> To unsubscribe:   
> http://lists.mysql.com/mysql?****<http://lists.mysql.com/mysql?**>
>>>>>> unsub=eroomydna@stripped<http**://lists.mysql.com/mysql?**
>>>>>>
> unsub=eroomydna@stripped<http://lists.mysql.com/mysql?unsub=1
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>> --
>>>> MySQL General Mailing List
>>>> For list archives: http://lists.mysql.com/mysql
>>>> To unsubscribe:    http://lists.mysql.com/mysql?**
>>>> unsub=eroomydna@stripped<http://lists.mysql.com/mysql?unsub=1
>>>>
>>>>
>>>
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe:    http://lists.mysql.com/mysql?unsub=1
>>
>
>
Thread
Query OptimizationBrandon Phelps30 Aug
  • Re: Query OptimizationMichael Dykman30 Aug
  • Re: Query OptimizationJochem van Dieten1 Sep
    • Re: Query OptimizationBrandon Phelps1 Sep
      • Re: Query OptimizationMySQL)1 Sep
        • Re: Query OptimizationBrandon Phelps1 Sep
          • Re: Query OptimizationBrandon Phelps8 Sep
            • Re: Query OptimizationAndrew Moore8 Sep
              • Re: Query OptimizationBrandon Phelps8 Sep
                • Re: Query OptimizationAndrew Moore8 Sep
                  • Re: Query OptimizationBrandon Phelps8 Sep
                    • Re: Query OptimizationDerek Downey8 Sep
                      • Re: Query OptimizationBrandon Phelps8 Sep
                        • Re: Query OptimizationMihail Manolov8 Sep
                          • Re: Query OptimizationBrandon Phelps8 Sep
                            • Re: Query OptimizationMihail Manolov8 Sep
                              • Re: Query OptimizationBrandon Phelps8 Sep
                • Re: Query OptimizationMihail Manolov8 Sep
                  • Re: Query OptimizationBrandon Phelps8 Sep
                    • Re: Query OptimizationAndrew Moore8 Sep
                      • Re: Query OptimizationBrandon Phelps8 Sep