List:General Discussion« Previous MessageNext Message »
From:Christian Mack Date:May 7 1999 5:56pm
Subject:Re: A select query
View as plain text  
Martin Oldfield wrote:
> 
> I'm trying to speed up a select and I wonder if the list's collective
> wisdom could save me some time. I have a table (called raw) of
> observations. Each observation has an identification tag (called
> cmark), an observation time (obstime), and a bunch of other
> rubbish. I need to find matches in the table i.e. pairs of raw
> observations with the same cmark, but a different observation time.
> 
> The raw table grows quite quickly so that it has perhaps a half a
> million entries at the end of the day. Every five minutes or so, I'd
> like to look for any new matches and add them to a match
> table. There's an extra column in the raw table which gets set to 1
> when the observation has been seen. Any new match must have one unseen
> `end'. By indexing the seen column, I'd hoped that mySQL would simply
> loop over the new observations rather the the entire raw table so that
> the time for the select would depend only weakly on the number of old
> observations in the raw table. Empirically this seems not be the
> case. Ideally I'd like to be able to keep perhaps 2-3 million raw
> observations lying around.
> 
> Is this strategy a good one, or are there other techniques I should
> use ? Which buffers are being used in the match ?
> 
> Would it be sensible to have two tables for raw observations
> corresponding to `seen' and `unseen' entries in the tables I have now?
> Then I can iterate over the unseen table only (admittedly three times
> though).
> 
> Anyway, at present to do all this I use the select command:
> 
> select up.id, dn.id from raw up, raw dn
>         where (up.seen = 0 or dn.seen = 0)
>         and (up.cmark = dn.cmark) and
>         (dn.obstime - up.obstime) > 0 and (dn.obstime - up.obstime) < 3600;
> 
> which is explained as:
> 
> +-------+------+---------------+-------+---------+----------+--------+------------+
> | table | type | possible_keys | key   | key_len | ref      | rows   | Extra      |
> +-------+------+---------------+-------+---------+----------+--------+------------+
> | up    | ALL  | cmark,seen    | NULL  |    NULL | NULL     | 196671 |            |
> | dn    | ref  | cmark,seen    | cmark |      15 | up.cmark |     10 | where used |
> +-------+------+---------------+-------+---------+----------+--------+------------+
> 
> The raw table itself is:
> 
> mysql> describe raw;
> +----------+-------------+------+-----+---------+----------------+
> | Field    | Type        | Null | Key | Default | Extra          |
> +----------+-------------+------+-----+---------+----------------+
> | id       | int(11)     |      | PRI | 0       | auto_increment |
> | cmark    | char(15)    |      | MUL |         |                |
> | obstime  | int(11)     |      |     | 0       |                |
> | node     | int(11)     |      |     | 0       |                |
> | imgfile  | char(24)    |      |     |         |                |
> | logprob  | float(10,2) | YES  |     | NULL    |                |
> | evidence | float(10,2) | YES  |     | NULL    |                |
> | seen     | int(11)     |      | MUL | 0       |                |
> +----------+-------------+------+-----+---------+----------------+
> 8 rows in set (0.01 sec)
> 
> Thanks in arrears for mySQL, and in advance for any insights.
> 
> Cheers,
> --
> Martin Oldfield.

Hi Martin

Mysql has problems with OR's in the WHERE part.
I would try to change the SELECT to this:
SELECT
	up.id
	,dn.id
FROM
	raw AS up
	, raw AS dn
WHERE
	(up.seen = 0
	AND up.cmark = dn.cmark
	AND dn.obstime - up.obstime > 0
	AND dn.obstime - up.obstime < 3600
	) OR
	(dn.seen = 0
	AND up.cmark = dn.cmark
	AND dn.obstime - up.obstime > 0
	AND dn.obstime - up.obstime < 3600
	)

Hope this helps
Christian

Thread
A select queryMartin Oldfield7 May
Re: A select queryChristian Mack7 May
  • Re: A select queryMartin Ramsch7 May
  • Re: A select queryMartin Oldfield10 May