MySQL Lists are EOL. Please join:

List:General Discussion« Previous MessageNext Message »
From:Chris Stoughton Date:October 11 2002 3:08pm
Subject:Performance when using two BETWEEN statements in the WHERE clause
multiple times
View as plain text  
I sent a similar question a few days ago.  I don't think there was a 
response.  If there was, sorry that I missed it.  I have worked around 
the issue, but would like to know whether there is something I can do to 
improve the orignal query.

I have a table with two spatial indices -- ra and decl, for right 
ascension and declination , think of them as x,y coordinates.  In order 
to match objects in one table to a second table, I choose a set of 
objects in the first table, find the limits of ra,decl, and then query 
the second table based on these limits.  I then do matching in a 
separate program, between these two lists.

For a specific example of one pair of queries:

select ra,decl from firstTable where fieldId=1
(based on the results of this query, calculate raMin,raMac, declMin, and 
declMax -- 1.1, 1.2, 3.4, 3.5 in this example)
select ra,decl,a,b,c,d from secondTable where ra between 1.1 1.2 and 
decl between 3.4 3.5

So, I wind up sending the following sequence of series:

select ra,decl from firstTable where fieldId=0
select ra,decl,a,b,c,d from secondTable where ra between 1.1 1.2 and 
decl between 3.4 3.5
select ra,decl from firstTable where fieldId=1
select ra,decl,a,b,c,d from secondTable where ra between 1.1 1.2 and 
decl between 3.5 3.6
select ra,decl from firstTable where fieldId=2
select ra,decl,a,b,c,d from secondTable where ra between 1.1 1.2 and 
decl between 3.6 3.7
select ra,decl from firstTable where fieldId=3
select ra,decl,a,b,c,d from secondTable where ra between 1.1 1.2 and 
decl between 3.7 3.8
select ra,decl from firstTable where fieldId=4
select ra,decl,a,b,c,d from secondTable where ra between 1.1 1.2 and 
decl between 3.8 3.9
select ra,decl from firstTable where fieldId=5
select ra,decl,a,b,c,d from secondTable where ra between 1.1 1.2 and 
decl between 3.9 4.0
select ra,decl from firstTable where fieldId=6
select ra,decl,a,b,c,d from secondTable where ra between 1.1 1.2 and 
decl between 4.0 4.1
select ra,decl from firstTable where fieldId=7
select ra,decl,a,b,c,d from secondTable where ra between 1.2 1.3 and 
decl between 3.5 3.6
select ra,decl from firstTable where fieldId=8
select ra,decl,a,b,c,d from secondTable where ra between 1.2 1.3 and 
decl between 3.6 3.7
select ra,decl from firstTable where fieldId=9
select ra,decl,a,b,c,d from secondTable where ra between 1.2 1.3 and 
decl between 3.7 3.8
select ra,decl from firstTable where fieldId=10
select ra,decl,a,b,c,d from secondTable where ra between 1.2 1.3 and 
decl between 3.8 3.9
select ra,decl from firstTable where fieldId=11
select ra,decl,a,b,c,d from secondTable where ra between 1.2 1.3 and 
decl between 3.9 4.0
select ra,decl from firstTable where fieldId=12
select ra,decl,a,b,c,d from secondTable where ra between 1.2 1.3 and 
decl between 4.0 4.1
and so on.....

Each of these selects returns approx 200 objects from the million 
records in the table.

The problem is that the first 40 or so queries take less than a second 
each, which is very nice performance.  However, after that, things bog 
down.  The queries with "where fieldId=n" continue to perform well, but 
the queries with the two betweens take longer and longer, over a minute 
per query.  Not good!  While this is happening, there is no process 
consuming CPU, no swapping, and no heavy disk activity.

Is there a way I can create the indices to optimize these queries?

(The work around is to add another field in targetTsObj, called 
radeclId.  This is an identifier for the grid that each ra,decl pair 
fall into. I build a grid of boxes that cover the legal range of 
ra,decl, and each grid has a uniqu radeclId.  Before loading, I 
calculate the radeclId for each object.  Then, to do this query, I then 
recast the where clause of the query from "ra between 1.1 1.2 and decl 
between 4.0 4.1" to "radeclId=id1 || radeclId=id2 || radeclId=id3 ..." 
where the list id1, id2, id3, .... is calculated from the ra,decl 
ranges.  Each one of these queries return in well under a second each, 
and do not bog down.  Therefore, I conclude that the disk and memory are 
performing well, and that I need to fix how I use indices.)


Here is what the indices look like for the secondTable, called 
targetTsObj in this example:
mysql> show index from targetTsObj;
+-------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+---------+
| Table       | Non_unique | Key_name  | Seq_in_index | Column_name | 
Collation | Cardinality | Sub_part | Packed | Comment |
+-------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+---------+
| targetTsObj |          1 | objId     |            1 | objId       | 
A         |     1228983 |     NULL | NULL   |         |
| targetTsObj |          1 | fieldId   |            1 | fieldId     | 
A         |        1985 |     NULL | NULL   |         |
| targetTsObj |          1 | bestObjId |            1 | bestObjId   | 
A         |           1 |     NULL | NULL   |         |
| targetTsObj |          1 | ra        |            1 | ra          | 
A         |     1228983 |     NULL | NULL   |         |
| targetTsObj |          1 | decl      |            1 | decl        | 
A         |     1228983 |     NULL | NULL   |         |
| targetTsObj |          1 | u         |            1 | u           | 
A         |      111725 |     NULL | NULL   |         |
| targetTsObj |          1 | g         |            1 | g           | 
A         |      122898 |     NULL | NULL   |         |
| targetTsObj |          1 | r         |            1 | r           | 
A         |      122898 |     NULL | NULL   |         |
| targetTsObj |          1 | i         |            1 | i           | 
A         |      122898 |     NULL | NULL   |         |
| targetTsObj |          1 | z         |            1 | z           | 
A         |      111725 |     NULL | NULL   |         |
| targetTsObj |          1 | ug        |            1 | u           | 
A         |      111725 |     NULL | NULL   |         |
| targetTsObj |          1 | ug        |            2 | g           | 
A         |     1228983 |     NULL | NULL   |         |
| targetTsObj |          1 | gr        |            1 | g           | 
A         |      122898 |     NULL | NULL   |         |
| targetTsObj |          1 | gr        |            2 | r           | 
A         |     1228983 |     NULL | NULL   |         |
| targetTsObj |          1 | ri        |            1 | r           | 
A         |      122898 |     NULL | NULL   |         |
| targetTsObj |          1 | ri        |            2 | i           | 
A         |     1228983 |     NULL | NULL   |         |
| targetTsObj |          1 | iz        |            1 | i           | 
A         |      122898 |     NULL | NULL   |         |
| targetTsObj |          1 | iz        |            2 | z           | 
A         |     1228983 |     NULL | NULL   |         |
+-------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+---------+
18 rows in set (0.01 sec)


Note that the "collation" is always "A" -- what can I read to learn what 
the different options are?  Are the indices built correctly?


Here are more specifics of our installation.  Please let me know what 
additional information will help diagnose this problem.
===================================================================

I send the queries with a client program, using these C api's:

mysql_real_connect(mysql, host, user, passwd, db, port, unix_socket, 
client_flag);

mysql_query(mysql, query);

mysql_store_result(mysql);

In a loop, until all rows are fetched:
field = mysql_fetch_fields(mysql_res);

And after getting all the rows for a query:
mysql_free_result(mysql_res);


Finally, here is /etc/my.cnf:

bash-2.04$ more /etc/my.cnf
[mysqld]
set-variable = key_buffer_size=512M
set-variable = table_cache=512
set-variable = query_buffer_size=20M
# set-variable = sort_buffer=100M
# set-variable = read_buffer_size=100M
datadir=/export/data/dp20.a/data/mysql
socket=/var/lib/mysql/mysql.sock
   [mysql.server]
user=mysql
basedir=/var/lib

[safe_mysqld]
err-log=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid








Thread
Performance when using two BETWEEN statements in the WHERE clausemultiple timesChris Stoughton11 Oct
  • Re: Performance when using two BETWEEN statements in the WHEREclause multiple timesMartin Waite14 Oct