List:General Discussion« Previous MessageNext Message »
From:Geert-Jan Brits Date:March 30 2010 11:54pm
Subject:Re: How to deal with 96 Dimensional Points ?
View as plain text  
Perhaps you could give us a (generalized) description of your use-case, so
we can better grasp what you want to achieve, and how you want to use it.
i.e: since I can't imagine/ envison a real 'eucledian distance' over 96
dimensions I bet you're talking a generalized distance function over N
This is usually only used in two general ways afaik: 1 calculating all
points that lie within a certain threshold (Chris' implementation prints
these points) 2 calculating an ordered top-M list of closests points to the
target point (Chris' implementation slightly altered) 3 (hmm maybe three:
clustering points based on their distance to eachother)
It helps if we know what you'r after. For instance: if you're points don't
change often and you want to achieve case 1 or 2  I would calculate these
once and all-at-once and save them in a seperate table, bc. the on-demand
variant may quickly become too slow. again depending on your case: option A.
{<pointid, {neighborids}>}   <-- list of neigborids per point id, with
pointid as key.
option B {<pointid, neighborid} <-- one neighborid per  point id, with
pointid + neighborid as key.

perhaps also helpful foor google etc.: - a distance function if more often
called a similarity function
- top-n 'points' for a given point are usually called its neighbors. - in
most cases you don't have to take the sqrt in Chris' implementation which
can save a lot (but instead  do:  if($SumSq <=($r*$r)){//code here}

2010/3/30 Chris W <4rfvgy7@stripped>

> I'm not sure why, but it seems that some people, I don't mean to imply that
> you are one of them, think there is some magic MySQL can preform to find
> points with in a given radius using the GIS extension.  There is no magic.
>  They simply use the well known math required to determine what points are
> inside the circle.  I could be wrong but I doubt there is any way to create
> an index that can directly indicate points with a a certain distance of
> other points unless that index included the distance from every point to
> every other point.  That is obviously not practical since with a set of only
> 14 points the index would have over 6 billion entries.
> lets call each of your dimensions d1, d2, d3 .... d96. If you create an
> index on d1, d2, .... d69, you can then create a simple query that will
> quickly find all points that will find all points that are with in a
> bounding box.  Since this query is going to get a bit large with 96
> dimensions, I would use code to create the query.  I will use php.  Let's
> start with the desired radius being r and the test point dimensions being in
> an array TestPointD[1] = x, TestPointD[2] = . . .
> $select = 'SELECT `PointID`, ';
> $where = 'WHERE ';
> foreach($TestPointD as $i => $d){
>  $di = 'd' . "$i";
>  $select .= "`$di`, "
>  $MinD = $d - $r;
>  $MaxD = $d + $r;
>  $where .= "`$di` >= '$MinD' AND `$di` <= '$MaxD' AND ";
> }
> $select = substr($select, 0, -2);  //trim of the trailing comma and space
> $where = substr($where, 0, -4);  //trim off the trailing 'AND '
> $query = "$select FROM `points` $where";
> Obviously this is going to give you points outside the sphere but still
> inside the cube.  However it will narrow down the set so the further math
> will not take as long.  If this were 3 dimensions with an uniform
> distribution of points, about 52% of the points returned by that query will
> be inside the sphere.  I'm not sure how to calculate the ratio of the volume
> sphere to a cube in 96 dimensions.    Then it will be a simple loop to find
> the points you really want.   While this query will likely return a lot of
> points that you don't want especially in 96D space, it will reduce it enough
> that the following loop will be much faster than looking all points in the
> table.
> $result = mysql_query($query) or die("DB error $query " . mysql_error() );
> while(($row = mysql_fetch_row($result))){
>  $sum
>  foreach($row as $i => $d){
>   if($i == 0){
>     $PointID = $d;
>     continue; // skip point id at $row[0]
>   }
>   $SumSq += pow($TestPointD[$i] - $d, 2);
>  }
>  if(sqrt($SumSq) <= $r){
>   print "$PointID is with in $r of test point.\n";
>  }
> }
> In an application I had that was similar (but in 2D) I would insert the id
> of the points that passed the condition into a temp table.  Then I could
> join that temp table to other tables do other queries I may need on those
> points.
> Chris W
> Werner Van Belle wrote:
>> Hello,
>> I have been pondering this for a while, but never really looked deeply
>> into the problem.
>> I have 96 dimensional points and I would like to pose queries such as:
>> 'give me all points that are within such a radius of this one'. The gis
>> extensions to mysql might support such type of query. The problem is of
>> course that points are 2 dimensional and I'm not sure whether I can
>> extend it to more than 3 dimensions ?
>> Does anybody have an idea about this ?
>> Wkr,
> --
> MySQL General Mailing List
> For list archives:
> To unsubscribe:

How to deal with 96 Dimensional Points ?Werner Van Belle30 Mar
  • Re: How to deal with 96 Dimensional Points ?Chris W30 Mar
    • Re: How to deal with 96 Dimensional Points ?Geert-Jan Brits31 Mar
      • Re: How to deal with 96 Dimensional Points ?Werner Van Belle31 Mar
    • Re: How to deal with 96 Dimensional Points ?Werner Van Belle31 Mar
      • Re: How to deal with 96 Dimensional Points ?Chris W31 Mar
Re: How to deal with 96 Dimensional Points ?Werner Van Belle30 Mar
Re: How to deal with 96 Dimensional Points ?Werner Van Belle30 Mar