Hi!
On Apr 12, JJ Tavernier wrote:
> On 4/12/06, Sergei Golubchik <serg@stripped> wrote:
> >
> > On Apr 11, JJ Tavernier wrote:
> > > Hi, I'd like to add a k-Nearest Neighbor functionality to mysql for
> > > spatial data. I'm thinking that the usage will look something like
> > > this:
> > >
> > > CREATE TABLE geom2 (g GEOMETRY);
> > >
> > > // ... insert some data
> > >
> > > SET @g = GeomFromText('POINT(1 1)');
> > > SELECT k_nearest_neighbor(@g, 3) FROM geom2;
> >
> > It would be probably very slow without a special index.
> > R-tree was not designed for this kind of task.
>
> It's my understanding that k-NN can be implemented efficiently with
> R-Trees. This paper gives one algorithm:
> http://citeseer.ist.psu.edu/roussopoulos95nearest.html. This paper
> gives another: http://www.cs.umd.edu/~hjs/pubs/incnear2.pdf.
Interesting. I didn't know that.
Thanks for pointers!
> > > The k_nearest_neighbor would then return the three closest objects to
> > > the point (1,1). From the portion of the manual on user-defined
> > > functions (http://dev.mysql.com/doc/refman/5.0/en/adding-udf.html), it
> > > looks like only strings, ints and reals can be returned from
> > > functions. How would I go about returning a row from a spatially
> > > indexed table? How do I deal with returning multiple rows? Am I wrong
> > > in thinking that functions are the way to go about solving this? I'm
> > > very new to mysql development, so I would not be surprised if this is
> > > the case.
> >
> > You cannot do it with an udf. Better to change the syntax to
> >
> > SELECT * FROM geom2 WHERE k_nearest_neighbor(geom2.geom_col, @g, 3)
> >
> > or
> >
> > SELECT * FROM geom2 WHERE k_nearest_neighbor(geom2.geom_col, @g) <= 3
> >
> > This could be done with an udf.
>
> R-tree can be traversed efficiently to find kNN, so I'd like to use
> the efficient algorithms (in papers above) instead of using full-table
> scans. If UDFs cannot be used to query the k-NN of a given object,
> what other options are available to me within mysql to return multiple
> rows
As I wrote, you cannot return many rows from a function.
What you can do - change the syntax to have a function that returns only
one value, as a regular function. E.g. a function that returns TRUE if the
row is kNN and FALSE otherwise. Then you can query it as
SELECT * FROM geom2 WHERE k_nearest_neighbor(geom_column, @g, 3)
Or make it a function that returns a rank of the neighbor, then it can
be queried as
SELECT * FROM geom2 WHERE k_nearest_neighbor(geom_column, @g) <= 3
In any case, the syntax will be the smallest of your problems :)
Adding necessary access methods to R-tree and the handler interface,
putting knowledge about kNN searches into the optimizer is going to be
more difficult.
Regards,
Sergei
--
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Sergei Golubchik <serg@stripped>
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, Senior Software Developer
/_/ /_/\_, /___/\___\_\___/ Kerpen, Germany
<___/ www.mysql.com