List:Internals« Previous MessageNext Message »
From:Sergei Golubchik Date:April 20 2006 4:49pm
Subject:Re: k-Nearest Neighbor query (was Re: Functions that return more than one row)
View as plain text  
Hi!

On Apr 16, JJ Tavernier wrote:
> Thanks a bunch.
> 
> >
> > As I wrote, you cannot return many rows from a function.
> > What you can do - change the syntax to have a function that returns only
> > one value, as a regular function. E.g. a function that returns TRUE if the
> > row is kNN and FALSE otherwise. Then you can query it as
> >
> >   SELECT * FROM geom2 WHERE k_nearest_neighbor(geom_column, @g, 3)
> >
> > Or make it a function that returns a rank of the neighbor, then it can
> > be queried as
> >
> >   SELECT * FROM geom2 WHERE k_nearest_neighbor(geom_column, @g) <= 3
> >
> > In any case, the syntax will be the smallest of your problems :)
> > Adding necessary access methods to R-tree and the handler interface,
> > putting knowledge about kNN searches into the optimizer is going to be
> > more difficult.
> 
> Yeah, I'm starting to understand that there is not going to be a
> simple solution to this:)
> I've started reading through some of the internals documentation, and
> looking through the rtree and select statement code.
> 
> As I see it, here are the high level modifications necessary:
> 1) rt_index.c: add the necessary logic to traverse the rtree index
> from the root, perform the kNN search along the way

Yes

> 2a) udf_knn.cpp: this is the definition of the k_nearest_neighbor
> function. Somehow, when this is called, it needs to get access to the
> spatial index (is this the MI_INFO structure?) for the geom_column for
> the geom2 table. The index is needed so that I can pass it to the kNN
> methods in rt_index.c. I could perform the kNN search when the init
> function is called, then return 1 in the xxx() function if the
> geom_column key is nearest neighbor k or less. Here are some
> questions:
>      - Is it possible to get ahold of the spatial index in a UDF?

No.

>      - How do I go about limiting the number of times that
> k_nearest_neighbor function is called (so that a full table scan isn't
> performed)? I suppose this will involve the optimizer.
> 
> OR
> 
> 2b) If there's no way to have a UDF get access to the spatial index on

Right, no way. UDF interface is too limited.

> a column, I'll have to make the internals aware of the kNN
> functionality. Somehow, the fact that the user wants the kNN needs to
> be retained during processing, which suggests that the parser needs to
> understand k_nearest_neighbor as a new operator. Any ideas on where I
> should look to add this? If we simplify and say that k == 1 (i.e., we
> are only interested in the nearest neighbor), then I think that a
> function similar to mi_rkey (in mi_rkey.c) might be sufficient (maybe
> this could be called mi_rknnkey). How would I go about making the
> parser aware that this new method needs to be called rather than
> mi_rkey?

Let's no simplify to k==1 case - it's too simple. That is it suggests
solutions you cannot generalize for k>1 case.

So what you may need to do is:

1. Add the necessary logic to rt_index.c (that is in MyISAM)
2. Extend handler API to support kNN search, so that SQL layer could ask
   MyISAM - via handler API - to do such a search.

   * it's basically as simple as adding new value to enum ha_rkey_function

3. add support for kNN searches to optimizer - so that it could know
   when to use r-tree index.

   * see, for example, how other spatial functions do to.

That's all :)

Regards,
Sergei

-- 
   __  ___     ___ ____  __
  /  |/  /_ __/ __/ __ \/ /   Sergei Golubchik <serg@stripped>
 / /|_/ / // /\ \/ /_/ / /__  MySQL AB, Senior Software Developer
/_/  /_/\_, /___/\___\_\___/  Kerpen, Germany
       <___/  www.mysql.com
Thread
k-Nearest Neighbor query (was Re: Functions that return more than one row)JJ Tavernier16 Apr
  • Re: k-Nearest Neighbor query (was Re: Functions that return more than one row)Sergei Golubchik20 Apr