From: Derek Downey Date: September 8 2011 6:55pm Subject: Re: Query Optimization List-Archive: http://lists.mysql.com/mysql/225671 Message-Id: <6ABCAF79-128B-4C36-B535-E7898CF4AAF3@orange-pants.com> MIME-Version: 1.0 (Apple Message framework v1244.3) Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Correct me if I'm wrong. You're wanting to get all records that have an = open_date or a close_date between two times. If that's correct, you might be able to get an index_merge by doing a = query like: WHERE ((starting time)<=3Dopen_dt<=3D (ending time)) OR ((starting = time)<=3Dclose_dt<=3D(ending time)) and creating two indexes (one on 'open_dt' and the other on 'close_dt') http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html Regards, Derek On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote: > Andy, >=20 > The queries take minutes to run. MySQL is 5.1.54 and it's running on = Ubuntu server 11.04. Unfortunately the machine only has 2GB of RAM but = no other major daemons are running on the machine. We are running RAID = 1 (mirroring) with 1TB drives. The tables in question here are all = MyISAM. When running with the LIMIT 10 my EXPLAIN is: >=20 > = +----+-------------+-------+--------+---------------------+---------------= ------+---------+--------------------+----------+-------------------------= ----+ > | id | select_type | table | type | possible_keys | key = | key_len | ref | rows | Extra = | > = +----+-------------+-------+--------+---------------------+---------------= ------+---------+--------------------+----------+-------------------------= ----+ > | 1 | SIMPLE | sc | range | ndx_open_close_rcvd | = ndx_open_close_rcvd | 8 | NULL | 32393316 | Using = where; Using filesort | > | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY = | 2 | syslog.sc.src_port | 1 | = | > | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY = | 2 | syslog.sc.dst_port | 1 | = | > = +----+-------------+-------+--------+---------------------+---------------= ------+---------+--------------------+----------+-------------------------= ----+ >=20 > When I remove the LIMIT 10 I get: >=20 > = ----+-------------+-------+--------+---------------------+---------+------= ---+--------------------+----------+-----------------------------+ > | id | select_type | table | type | possible_keys | key | = key_len | ref | rows | Extra | > = +----+-------------+-------+--------+---------------------+---------+-----= ----+--------------------+----------+-----------------------------+ > | 1 | SIMPLE | sc | ALL | ndx_open_close_rcvd | NULL | = NULL | NULL | 32393330 | Using where; Using filesort | > | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | = 2 | syslog.sc.src_port | 1 | | > | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | = 2 | syslog.sc.dst_port | 1 | | > = +----+-------------+-------+--------+---------------------+---------+-----= ----+--------------------+----------+-----------------------------+ >=20 > Thanks for all your help thus far. >=20 > On 09/08/2011 02:38 PM, Andrew Moore wrote: >> I don't think I saw any query timings in the emails (maybe I missed = them). >>=20 >> What version of MySQL are you currently using? >> What does the explain look like when your remove the limit 10? >> Is your server tuned for MyISAM or InnoDB? >> What kind of disk setup is in use? >> How much memory is in your machine? >>=20 >>=20 >> On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelps = wrote: >>=20 >>> Thanks for the reply Andy. Unfortunately the users will be = selecting >>> varying date ranges and new data is constantly coming in, so I am = not sure >>> how I could archive/cache the necessary data that would be any more >>> efficient than simply using the database directly. >>>=20 >>>=20 >>>=20 >>> On 09/08/2011 02:16 PM, Andrew Moore wrote: >>>=20 >>>> Thinking outside the query, is there any archiving that could = happen to >>>> make >>>> your large tables kinder in the range scan? >>>>=20 >>>> Andy >>>>=20 >>>> On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps = wrote: >>>>=20 >>>> On 09/01/2011 01:32 PM, Brandon Phelps wrote: >>>>>=20 >>>>> On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote: >>>>>>=20 >>>>>> On 9/1/2011 09:42, Brandon Phelps wrote: >>>>>>>=20 >>>>>>> On 09/01/2011 04:59 AM, Jochem van Dieten wrote: >>>>>>>>=20 >>>>>>>>> ... >>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>> WHERE >>>>>>>>>> (open_dt>=3D '2011-08-30 00:00:00' OR close_dt>=3D = '2011-08-30 >>>>>>>>>>=20 >>>>>>>>> 00:00:00') >>>>>>>>=20 >>>>>>>>> AND (open_dt<=3D '2011-08-30 12:36:53' OR close_dt<=3D = '2011-08-30 >>>>>>>>>>=20 >>>>>>>>> 12:36:53') >>>>>>>>=20 >>>>>>>> In that case your logic here simplifies to: >>>>>>>>> WHERE >>>>>>>>> open_dt>=3D '2011-08-30 00:00:00' >>>>>>>>> AND >>>>>>>>> close_dt<=3D '2011-08-30 12:36:53' >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>> Now add an index over open_dt and close_dt and see what = happens. >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>> Jochem >>>>>>>>>=20 >>>>>>>>=20 >>>>>>>> Jochem, >>>>>>>>=20 >>>>>>>> I can't really use your WHERE logic because I also need to = retrieve >>>>>>>> results where the open_dt time is out of the range specified. = For >>>>>>>> example, a very large file download might span multiple days so = given >>>>>>>> your logic if the connection was started 2 days ago and I want = to pull >>>>>>>> 1 >>>>>>>> days worth of connections, I would miss that entry. Basically I = want >>>>>>>> to >>>>>>>> SELECT all of the records that were opened OR closed during the >>>>>>>> specified time period, ie. if any activity happened between my = start >>>>>>>> and >>>>>>>> end dates, I need to see that record. >>>>>>>>=20 >>>>>>>> Any other ideas? >>>>>>>>=20 >>>>>>>>=20 >>>>>>>> I believe Jochem was on the right track but he got his dates >>>>>>> reversed. >>>>>>>=20 >>>>>>> Let's try a little ASCII art to show the situation. I will setup = a >>>>>>> query >>>>>>> window with two markers (s) and (e). Events will be marked by = |----| >>>>>>> markers >>>>>>> showing their durations. >>>>>>>=20 >>>>>>> a) (s) (e) >>>>>>> b) |---| >>>>>>> c) |---| >>>>>>> d) |---| >>>>>>> e) |--------------------| >>>>>>> f) |---| >>>>>>> g) |---| >>>>>>>=20 >>>>>>> To describe these situations: >>>>>>> a) is the window for which you want to query (s) is the starting = time >>>>>>> and >>>>>>> (e) is the ending time for the date range you are interested in. >>>>>>> b) the event starts and stops before your window exists. It = won't be >>>>>>> part >>>>>>> of your results. >>>>>>> c) the event starts before the window but ends within the window = - >>>>>>> include this >>>>>>> d) the event starts and ends within the window - include this >>>>>>> e) the event starts before the window and ends after the window = - >>>>>>> include >>>>>>> this >>>>>>> f) the event starts inside the window but ends beyond the window = - >>>>>>> include this. >>>>>>> g) the event starts and ends beyond the window - exclude this. >>>>>>>=20 >>>>>>> In order to get every event in the range of c-f, here is what = you need >>>>>>> for a WHERE clause >>>>>>>=20 >>>>>>> WHERE start<=3D (ending time) and end>=3D (starting time) >>>>>>>=20 >>>>>>> Try that and let us know the results. >>>>>>>=20 >>>>>>>=20 >>>>>> Thanks Jochem and Shawn, however the following two queries result = in the >>>>>> exact same EXPLAIN output: (I hope the tables don't wrap too = early for >>>>>> you) >>>>>>=20 >>>>>> Old method: >>>>>> SELECT >>>>>> sc.open_dt, >>>>>> sc.close_dt, >>>>>> sc.protocol, >>>>>> INET_NTOA(sc.src_address) AS src_address, >>>>>> sc.src_port, >>>>>> INET_NTOA(sc.dst_address) AS dst_address, >>>>>> sc.dst_port, >>>>>> sc.sent, >>>>>> sc.rcvd, >>>>>> spm.desc AS src_port_desc, >>>>>> dpm.desc AS dst_port_desc >>>>>> FROM firewall_connections AS sc >>>>>> LEFT JOIN port_mappings AS spm ON spm.port =3D sc.src_port >>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port =3D sc.dst_port >>>>>> WHERE >>>>>> (open_dt>=3D '2011-08-31 09:53:31' OR close_dt>=3D '2011-08-31 = 09:53:31') >>>>>> AND (open_dt<=3D '2011-09-01 09:53:31' OR close_dt<=3D = '2011-09-01 >>>>>> 09:53:31') >>>>>> ORDER BY rcvd DESC >>>>>> LIMIT 0, 10; >>>>>>=20 >>>>>> New method with BTREE index on open_dt, close_dt (index name is >>>>>> ndx_open_close_dt): >>>>>> SELECT >>>>>> sc.open_dt, >>>>>> sc.close_dt, >>>>>> sc.protocol, >>>>>> INET_NTOA(sc.src_address) AS src_address, >>>>>> sc.src_port, >>>>>> INET_NTOA(sc.dst_address) AS dst_address, >>>>>> sc.dst_port, >>>>>> sc.sent, >>>>>> sc.rcvd, >>>>>> spm.desc AS src_port_desc, >>>>>> dpm.desc AS dst_port_desc >>>>>> FROM firewall_connections AS sc >>>>>> LEFT JOIN port_mappings AS spm ON spm.port =3D sc.src_port >>>>>> LEFT JOIN port_mappings AS dpm ON dpm.port =3D sc.dst_port >>>>>> WHERE >>>>>> open_dt<=3D '2011-09-01 09:53:31' AND close_dt>=3D '2011-08-31 = 09:53:31' >>>>>> ORDER BY rcvd DESC >>>>>> LIMIT 0, 10; >>>>>>=20 >>>>>> EXPLAIN output for old method: >>>>>> = +----+-------------+-------+--****------+---------------------**--** >>>>>> ----+----------+---------+----****----------------+------+----** >>>>>> --**-------+ >>>>>> | id | select_type | table | type | possible_keys | key | key_len = | ref >>>>>> | >>>>>> rows | Extra | >>>>>> = +----+-------------+-------+--****------+---------------------**--** >>>>>> ----+----------+---------+----****----------------+------+----** >>>>>> --**-------+ >>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd = | 4 | >>>>>> NULL | 10 | Using where | >>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | = syslog.sc.src_port >>>>>> | >>>>>> 1 | | >>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | = syslog.sc.dst_port >>>>>> | >>>>>> 1 | | >>>>>> = +----+-------------+-------+--****------+---------------------**--** >>>>>> ----+----------+---------+----****----------------+------+----** >>>>>> --**-------+ >>>>>>=20 >>>>>> EXPLAIN output for new method with new index: >>>>>> = +----+-------------+-------+--****------+---------------------**--** >>>>>> ----+----------+---------+----****----------------+------+----** >>>>>> --**-------+ >>>>>> | id | select_type | table | type | possible_keys | key | key_len = | ref >>>>>> | >>>>>> rows | Extra | >>>>>> = +----+-------------+-------+--****------+---------------------**--** >>>>>> ----+----------+---------+----****----------------+------+----** >>>>>> --**-------+ >>>>>> | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd = | 4 | >>>>>> NULL | 10 | Using where | >>>>>> | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | = syslog.sc.src_port >>>>>> | >>>>>> 1 | | >>>>>> | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | = syslog.sc.dst_port >>>>>> | >>>>>> 1 | | >>>>>> = +----+-------------+-------+--****------+---------------------**--** >>>>>> ----+----------+---------+----****----------------+------+----** >>>>>> --**-------+ >>>>>>=20 >>>>>> SHOW INDEX: >>>>>> = +----------------------+------****------+-------------------+-**--** >>>>>> = -----------+-------------+----****-------+-------------+------**--** >>>>>> --+--------+------+-----------****-+---------+ >>>>>> | Table | Non_unique | Key_name | Seq_in_index | Column_name | = Collation >>>>>> | >>>>>> Cardinality | Sub_part | Packed | Null | Index_type | Comment | >>>>>> = +----------------------+------****------+-------------------+-**--** >>>>>> = -----------+-------------+----****-------+-------------+------**--** >>>>>> --+--------+------+-----------****-+---------+ >>>>>> | firewall_connections | 1 | ndx_open_close_dt | 1 | open_dt | A = | >>>>>> 1342691 >>>>>> | NULL | NULL | | BTREE | | >>>>>> | firewall_connections | 1 | ndx_open_close_dt | 2 | close_dt | A = | >>>>>> 6377783 | NULL | NULL | | BTREE | | >>>>>> = +----------------------+------****------+-------------------+-**--** >>>>>> = -----------+-------------+----****-------+-------------+------**--** >>>>>> --+--------+------+-----------****-+---------+ >>>>>>=20 >>>>>>=20 >>>>>> Although right now the queries do seem to be executing much = faster, >>>>>> although I'm not quite sure why. And I'm not sure why the new >>>>>> ndx_open_close_dt isn't being used either. >>>>>>=20 >>>>>> -Brandon >>>>>>=20 >>>>>>=20 >>>>>> I am still having a big issue with my query as seen above. The = table >>>>> is up >>>>> to around 32 million records at the moment and either of the two = SELECT >>>>> queries above take a very long time to run. Is there anything at = all I >>>>> can >>>>> do to speed things up? It seems that changing the format of the = WHERE >>>>> clause did not help at all, as the EXPLAIN output is exactly the = same for >>>>> both version. I also tried adding an index on (open_dt, close_dt, = rcvd) >>>>> but >>>>> that index does not get used. >>>>>=20 >>>>> Any other ideas? >>>>>=20 >>>>> Thanks in advance, >>>>>=20 >>>>>=20 >>>>> Brandon >>>>>=20 >>>>> -- >>>>> MySQL General Mailing List >>>>> For list archives: http://lists.mysql.com/mysql >>>>> To unsubscribe: = http://lists.mysql.com/mysql?**** >>>>> unsub=3Deroomydna@stripped>>>> = unsub=3Deroomydna@stripped >>>>>>=20 >>>>>=20 >>>>>=20 >>>>>=20 >>>>=20 >>> -- >>> MySQL General Mailing List >>> For list archives: http://lists.mysql.com/mysql >>> To unsubscribe: http://lists.mysql.com/mysql?** >>> = unsub=3Deroomydna@stripped >>>=20 >>>=20 >>=20 >=20 > --=20 > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: = http://lists.mysql.com/mysql?unsub=3Dderek@stripped >=20