Thanks a bunch.
>
> As I wrote, you cannot return many rows from a function.
> What you can do - change the syntax to have a function that returns only
> one value, as a regular function. E.g. a function that returns TRUE if the
> row is kNN and FALSE otherwise. Then you can query it as
>
> SELECT * FROM geom2 WHERE k_nearest_neighbor(geom_column, @g, 3)
>
> Or make it a function that returns a rank of the neighbor, then it can
> be queried as
>
> SELECT * FROM geom2 WHERE k_nearest_neighbor(geom_column, @g) <= 3
>
> In any case, the syntax will be the smallest of your problems :)
> Adding necessary access methods to R-tree and the handler interface,
> putting knowledge about kNN searches into the optimizer is going to be
> more difficult.
Yeah, I'm starting to understand that there is not going to be a
simple solution to this:)
I've started reading through some of the internals documentation, and
looking through the rtree and select statement code.
As I see it, here are the high level modifications necessary:
1) rt_index.c: add the necessary logic to traverse the rtree index
from the root, perform the kNN search along the way
2a) udf_knn.cpp: this is the definition of the k_nearest_neighbor
function. Somehow, when this is called, it needs to get access to the
spatial index (is this the MI_INFO structure?) for the geom_column for
the geom2 table. The index is needed so that I can pass it to the kNN
methods in rt_index.c. I could perform the kNN search when the init
function is called, then return 1 in the xxx() function if the
geom_column key is nearest neighbor k or less. Here are some
questions:
- Is it possible to get ahold of the spatial index in a UDF?
- How do I go about limiting the number of times that
k_nearest_neighbor function is called (so that a full table scan isn't
performed)? I suppose this will involve the optimizer.
OR
2b) If there's no way to have a UDF get access to the spatial index on
a column, I'll have to make the internals aware of the kNN
functionality. Somehow, the fact that the user wants the kNN needs to
be retained during processing, which suggests that the parser needs to
understand k_nearest_neighbor as a new operator. Any ideas on where I
should look to add this? If we simplify and say that k == 1 (i.e., we
are only interested in the nearest neighbor), then I think that a
function similar to mi_rkey (in mi_rkey.c) might be sufficient (maybe
this could be called mi_rknnkey). How would I go about making the
parser aware that this new method needs to be called rather than
mi_rkey?
Thanks for your help. Any help you can provide is appreciated.
JJ