List: General Discussion « Previous MessageNext Message » From: Werner Van Belle Date: March 31 2010 4:58am Subject: Re: How to deal with 96 Dimensional Points ? View as plain text
```Hello Chris,

The use case I'
m talking about is actually a typical usecase for GIS applications: give
me the x closest points to this one. E.g: give me the 10 points closest
to (1,2,79) or in my case: give me the 100 points closest to
(x1,....x96). A query like yours might be possible and might be a good
solution if we would know the radius in which we are looking for the
points, but this is not really the case: we merely want a list returned
ordered by distance. Solving this with your solution is possible but is
quite slow. There exists nice datastructures to deal with this type of
problem as said and these are used in the GIS implementation in MySql.

Chris W wrote:
> I'm not sure why, but it seems that some people, I don't mean to imply
> that you are one of them, think there is some magic MySQL can preform
> to find points with in a given radius using the GIS extension.  There
> is no magic.  They simply use the well known math required to
> determine what points are inside the circle.
GIS extenstions are also not only about distances: the above query is
better solved with specialized datastructures.
> I could be wrong but I doubt there is any way to create an index that
> can directly indicate points with a a certain distance of other points
> unless that index included the distance from every point to every
> other point.  That is obviously not practical since with a set of only
> 14 points the index would have over 6 billion entries.
Partitioning of the space such as done in 3D render engines do solve
this problem more efficiently than having a list of all pairtwise
distances.  So the question is not whether such algorithms exist, it is
rather whether they are available in/through MySql.

> lets call each of your dimensions d1, d2, d3 .... d96. If you create
> an index on d1, d2, .... d69, you can then create a simple query that
> will quickly find all points that will find all points that are with
> in a bounding box.  Since this query is going to get a bit large with
> 96 dimensions, I would use code to create the query.  I will use php.
> dimensions being in an array TestPointD[1] = x, TestPointD[2] = . . .
>
> \$select = 'SELECT `PointID`, ';
> \$where = 'WHERE ';
> foreach(\$TestPointD as \$i => \$d){
>  \$di = 'd' . "\$i";
>  \$select .= "`\$di`, "
>  \$MinD = \$d - \$r;
>  \$MaxD = \$d + \$r;
>  \$where .= "`\$di` >= '\$MinD' AND `\$di` <= '\$MaxD' AND ";
> }
> \$select = substr(\$select, 0, -2);  //trim of the trailing comma and space
> \$where = substr(\$where, 0, -4);  //trim off the trailing 'AND '
>
> \$query = "\$select FROM `points` \$where";
>
Thanks for the nice illustration. In this case with the proper indices
this will indeed split the space in sections; nevertheless this approach
has great difficulties returning an ordered list of distances and
prefereably only the 100 closest ones at that.

Wkr,

--
http://werner.yellowcouch.org/

Attachment: [application/pgp-signature] OpenPGP digital signature signature.asc
```