Yves,
> My problem is that the sub-select in line 7
> ("SELECT 1") takes a rather long time
It might be possible to simplify. Do I have the schema right?
message (messageID)
keylist (keylistID)
tag ( tagID, readaccesskeylist references keylist(keylistID) )
message_revision_tag ( ???, messageID references message(messageID),
tagID references tag(tagID))
Or is there a message_revisions table missing from your description? On
what I have, if the condition tag.readaccesskeylist IS NOT NULL defines
a 'banned' condition, perhaps the query can be written without
referencing the keylist table, since the requirement simply translates
to finding all messages with no tag having a non-null readaccesskeylist,
i.e. ...
(i} find the messages which have a non-null tag.readaccesskeylist,
(ii) find the messages which are not in [i].
(i) Finding messages which have a deny-access tag looks like a simple join:
SELECT DISTINCT messageID
FROM message_revision_tag AS mrt
JOIN tag AS t ON mrt.tagID=t.tagID
WHERE t.readaccesskeylist IS NOT NULL;
(ii) We get the messages not in the above result with a simple exclusion
join:
SELECT messageID
FROM message m
LEFT JOIN (
SELECT DISTINCT messageID
FROM message_revision_tag AS mrt
JOIN tag AS t ON mrt.tagID=t.tagID
WHERE t.readaccesskeylist IS NOT NULL
) AS banned USING (messageID)
WHERE banned.messageID IS NULL,
Or did I miss something?
PB
http://www.artfulsoftware.com
-----
Yves Goergen wrote:
> Hi,
>
> I have a performance problem with one of my SQL queries. It's a rather
> complex one so I'll spare you the details. This is the situation:
>
> In my system, there's messages, tags and keylists. Each message has
> message_revisions, each message_revision can be assigned tags (stored
> in message_revision_tag). Each tag points to a keylist that contains
> all keys that grant access to messages with that tag. If a tag has no
> keylist assigned (ReadAccessKeylist IS NULL), then everybody may
> access the messages.
>
> This query finds all messages that don't have a tag assigned that
> would deny access to it. (Assume every message has only a single
> revision with the number 1, for now. The actual user comparison with
> another sub-select is hidden in <some more>.)
>
> SELECT m."MessageId"
> FROM "message" m
> WHERE
> NOT EXISTS
> (SELECT
> EXISTS
> (SELECT 1
> FROM "keylist" tk
> WHERE tk."KeylistId" = t."ReadAccessKeylist" AND <some more>)
> AS "Allowed"
> FROM "message_revision_tag" mrt
> JOIN "tag" t USING ("TagId")
> WHERE mrt."MessageId" = m."MessageId" AND
> mrt."RevisionNumber" = 1 AND
> t."ReadAccessKeylist" IS NOT NULL
> HAVING NOT "Allowed")
>
> My problem is that the sub-select in line 7 ("SELECT 1") takes a
> rather long time. (When I remove it, it's much faster.) I'm not sure
> why, because there's not a single keylist in that table, however.
> Another issue is that this query should actually never be regarded.
> The condition in the second-last line is always false. A simple test
> confirms that:
>
> SELECT COUNT(*) FROM "tag" WHERE "ReadAccessKeylist" IS NOT NULL
> -> 0
>
> So there should never be a reason why the FROM in line 11 would result
> in a row (filtering out with the conditions, of course). But it still
> gets executed. When I make sure that the condition is always false, by
> adding a "AND 0" just before the HAVING clause, the whole thing runs
> much faster.
>
> I have no separate timings, but it's in the magnitude of 1 vs. 5-10
> milliseconds for the query. Run a hundred times makes a noticeable delay.
>
> My understanding of it all was that first the FROM clause is regarded
> to see what rows there are. Then WHERE filters them, then SELECT will
> pick some columns (and thereby execute my sub-select expression) and
> finally HAVING filters again. Since in my theory there is no single
> row, SELECT has nothing to do. But obviously it has.
>
> Some suggestion what's going on?
>