List: General Discussion « Previous MessageNext Message » From: Geert-Jan Brits Date: March 30 2010 11:54pm Subject: Re: How to deal with 96 Dimensional Points ? View as plain text
```Perhaps you could give us a (generalized) description of your use-case, so
we can better grasp what you want to achieve, and how you want to use it.
i.e: since I can't imagine/ envison a real 'eucledian distance' over 96
dimensions I bet you're talking a generalized distance function over N
dimenions.
This is usually only used in two general ways afaik: 1 calculating all
points that lie within a certain threshold (Chris' implementation prints
these points) 2 calculating an ordered top-M list of closests points to the
target point (Chris' implementation slightly altered) 3 (hmm maybe three:
clustering points based on their distance to eachother)
It helps if we know what you'r after. For instance: if you're points don't
change often and you want to achieve case 1 or 2  I would calculate these
once and all-at-once and save them in a seperate table, bc. the on-demand
variant may quickly become too slow. again depending on your case: option A.
{<pointid, {neighborids}>}   <-- list of neigborids per point id, with
pointid as key.
option B {<pointid, neighborid} <-- one neighborid per  point id, with
pointid + neighborid as key.

perhaps also helpful foor google etc.: - a distance function if more often
called a similarity function
- top-n 'points' for a given point are usually called its neighbors. - in
most cases you don't have to take the sqrt in Chris' implementation which
can save a lot (but instead  do:  if(\$SumSq <=(\$r*\$r)){//code here}

2010/3/30 Chris W <4rfvgy7@stripped>

> I'm not sure why, but it seems that some people, I don't mean to imply that
> you are one of them, think there is some magic MySQL can preform to find
> points with in a given radius using the GIS extension.  There is no magic.
>  They simply use the well known math required to determine what points are
> inside the circle.  I could be wrong but I doubt there is any way to create
> an index that can directly indicate points with a a certain distance of
> other points unless that index included the distance from every point to
> every other point.  That is obviously not practical since with a set of only
> 14 points the index would have over 6 billion entries.
>
> lets call each of your dimensions d1, d2, d3 .... d96. If you create an
> index on d1, d2, .... d69, you can then create a simple query that will
> quickly find all points that will find all points that are with in a
> bounding box.  Since this query is going to get a bit large with 96
> dimensions, I would use code to create the query.  I will use php.  Let's
> an array TestPointD[1] = x, TestPointD[2] = . . .
>
>
> \$select = 'SELECT `PointID`, ';
> \$where = 'WHERE ';
> foreach(\$TestPointD as \$i => \$d){
>  \$di = 'd' . "\$i";
>  \$select .= "`\$di`, "
>  \$MinD = \$d - \$r;
>  \$MaxD = \$d + \$r;
>  \$where .= "`\$di` >= '\$MinD' AND `\$di` <= '\$MaxD' AND ";
> }
> \$select = substr(\$select, 0, -2);  //trim of the trailing comma and space
> \$where = substr(\$where, 0, -4);  //trim off the trailing 'AND '
>
> \$query = "\$select FROM `points` \$where";
>
>
> Obviously this is going to give you points outside the sphere but still
> inside the cube.  However it will narrow down the set so the further math
> will not take as long.  If this were 3 dimensions with an uniform
> distribution of points, about 52% of the points returned by that query will
> be inside the sphere.  I'm not sure how to calculate the ratio of the volume
> sphere to a cube in 96 dimensions.    Then it will be a simple loop to find
> the points you really want.   While this query will likely return a lot of
> points that you don't want especially in 96D space, it will reduce it enough
> that the following loop will be much faster than looking all points in the
> table.
>
>
>
> \$result = mysql_query(\$query) or die("DB error \$query " . mysql_error() );
> while((\$row = mysql_fetch_row(\$result))){
>  \$sum
>  foreach(\$row as \$i => \$d){
>   if(\$i == 0){
>     \$PointID = \$d;
>     continue; // skip point id at \$row[0]
>   }
>   \$SumSq += pow(\$TestPointD[\$i] - \$d, 2);
>  }
>  if(sqrt(\$SumSq) <= \$r){
>   print "\$PointID is with in \$r of test point.\n";
>  }
> }
>
>
> In an application I had that was similar (but in 2D) I would insert the id
> of the points that passed the condition into a temp table.  Then I could
> join that temp table to other tables do other queries I may need on those
> points.
>
> Chris W
>
>
>
> Werner Van Belle wrote:
>
>> Hello,
>>
>> I have been pondering this for a while, but never really looked deeply
>> into the problem.
>>
>> I have 96 dimensional points and I would like to pose queries such as:
>> 'give me all points that are within such a radius of this one'. The gis
>> extensions to mysql might support such type of query. The problem is of
>> course that points are 2 dimensional and I'm not sure whether I can
>> extend it to more than 3 dimensions ?
>>
>>
>> Wkr,
>>
>>
>>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql?unsub=1
>
>

```