List:General Discussion« Previous MessageNext Message »
From:Peter Thomassen Date:February 22 2009 8:10pm
Subject: Problem using deterministic stored function
View as plain text  
Hi,

I am using MySQL 5.0.51a. I've got a problem with a stored function. It
reads as follows:

> CREATE FUNCTION `_contractRoot`(temp INT) RETURNS int(11)
> 	READS SQL DATA
> BEGIN
> 	DECLARE _parent_id INT;
> 	
> 	REPEAT
> 		SET _parent_id = temp;
> 		SELECT parent_id INTO temp FROM contract WHERE id = _parent_id;
> 		UNTIL temp IS NULL
> 	END REPEAT;
> 	RETURN _parent_id;
> END

There is a table "contract" containing row groups that form a tree. In
the table, the columns "id", "parent_id" and "number" are defined. Each
tree root has set the NULL value for the "parent_id" column, while the
child rows have references to the ID of another row, and by following
these references from any child, the root ID will be calculated by this
function. Additionaly, there is a value set in the "number" column if
parent_id IS NOT NULL, i.e. the root row has a value set, while the
other rows have number = NULL. This maps each tree to a number.

(There are 1.500 rows in the "contract" table.)

Now, let's look at these queries:

mysql> SELECT _contractRoot(320);
+--------------------+
| _contractRoot(320) |
+--------------------+
|                317 |
+--------------------+
1 row in set (0.00 sec)

mysql> SELECT number FROM contract WHERE id = _contractRoot(320);
+--------+
| number |
+--------+
|     93 |
+--------+
1 row in set (0.06 sec)

As you see, _contractRoot(320) is run in a very short period of time,
while the second statement seems to run the _contractRoot function for
each line that is processed by the WHERE clause, and therefore takes longer.

This was not the case before the upgrade to Debian Lenny. Before (MySQL
5.0.32), the second statement also took nearly 0.00 sec. So, is this a
regression?


I noticed that the problem can be fixed by specifying DETERMINISTIC as
an additional keyword in the CREATE FUNCTION statement. In this case,
the second statement also runs in 0.00 sec.

But I'm not sure if DETERMINISTIC is legal here. As you see, the
function reads data from the table, and it may be that these data
changes. So the question is what DETERMINISTIC means!
Two options:
1.)
DETERMINISTIC means that the function does not depend on variable data
and will _always_ return the same value. This corresponds to what the
manual says: "A procedure or function is considered “deterministic” if
it always produces the same result for the same input parameters, and
“not deterministic” otherwise." But I'm not sure if "always" can be
construed that strictly here. It would be inconsistent with the "READS
SQL DATA" characteristic.
2.)
DETERMINISTIC means that the function does not use any non-constant
input except data from the database tables (i.e. no CURRENT_DATE(),
random numbers etc.). How would the caching mechanism work in this case?

In the second case, I may declare my function DETERMINISTIC, in the
first case I may not. Does anybody know what is right here?


I did another observations that is closely related to this: Even with
the DETERMINISTIC keyword, the following takes long:

mysql> SELECT id, _contractRoot(320) FROM contract;
+------+--------------------+
| id   | _contractRoot(320) |
+------+--------------------+
...
| 1560 |                317 |
| 1561 |                317 |
+------+--------------------+
1477 rows in set (0.06 sec)

The run time seems to be independent of the use of DETERMINISTIC, but to
my understanding, there is no need to execute the function more often
than when doing

mysql> SELECT number FROM contract WHERE id = _contractRoot(320);

which returns after 0.00 sec. As I said, this behaviour happens despite
of DETERMINISTIC. I'm not sure if this is a bug.

Thanks,
Peter

Thread
Problem using deterministic stored functionPeter Thomassen22 Feb