List:General Discussion« Previous MessageNext Message »
From:Chris W Date:March 31 2010 5:22am
Subject:Re: How to deal with 96 Dimensional Points ?
View as plain text  
Here is an idea, I'm not going to code this one:)  It's still not an 
ideal solution because it has to make assumptions about your data set.  
Execute the algorithm I outlined previously with a very small r value, 
if you didn't find the number of points you are looking for, increase r 
and modify the query slightly so it doesn't return any of the points the 
first query returned .... something like AND `PointID` NOT in ('34', 
'56', '67', . . .).
At every step along the way insert the point id of the points inside of 
r along with the distance they are from the test point, once you have 
over 100 records in this table stop increasing r and query the temp 
table sorted by distance with a limit of 100.  Of course you have to 
have some knowledge of your data set to get a reasonable start value for 
r and a reasonable method for determining how much to increase it each time.

On the other hand a minor modification seems better.  By inserting all 
the points in the cube along with their distance in the temp table, a 
query like SELECT count(*) FROM temp WHERE `Distance` <= r Would be a 
good way to see if you need to continue to the next round.  Also doing 
it that way, instead of using the NOT IN syntax, which I understand can 
be slow, you can modify the where condition to find points that are 
inside the current cube of size r but are outside the previous cube.

Chris W

Werner Van Belle wrote:
> Hello Chris,
>
> The use case I'
> m talking about is actually a typical usecase for GIS applications: give
> me the x closest points to this one. E.g: give me the 10 points closest
> to (1,2,79) or in my case: give me the 100 points closest to
> (x1,....x96). A query like yours might be possible and might be a good
> solution if we would know the radius in which we are looking for the
> points, but this is not really the case: we merely want a list returned
> ordered by distance. Solving this with your solution is possible but is
> quite slow. There exists nice datastructures to deal with this type of
> problem as said and these are used in the GIS implementation in MySql.
>
> Chris W wrote:
>   
>> I'm not sure why, but it seems that some people, I don't mean to imply
>> that you are one of them, think there is some magic MySQL can preform
>> to find points with in a given radius using the GIS extension.  There
>> is no magic.  They simply use the well known math required to
>> determine what points are inside the circle.
>>     
> GIS extenstions are also not only about distances: the above query is
> better solved with specialized datastructures.
>   
>> I could be wrong but I doubt there is any way to create an index that
>> can directly indicate points with a a certain distance of other points
>> unless that index included the distance from every point to every
>> other point.  That is obviously not practical since with a set of only
>> 14 points the index would have over 6 billion entries.
>>     
> Partitioning of the space such as done in 3D render engines do solve
> this problem more efficiently than having a list of all pairtwise
> distances.  So the question is not whether such algorithms exist, it is
> rather whether they are available in/through MySql.
>
>   
>> lets call each of your dimensions d1, d2, d3 .... d96. If you create
>> an index on d1, d2, .... d69, you can then create a simple query that
>> will quickly find all points that will find all points that are with
>> in a bounding box.  Since this query is going to get a bit large with
>> 96 dimensions, I would use code to create the query.  I will use php. 
>> Let's start with the desired radius being r and the test point
>> dimensions being in an array TestPointD[1] = x, TestPointD[2] = . . .
>>
>> $select = 'SELECT `PointID`, ';
>> $where = 'WHERE ';
>> foreach($TestPointD as $i => $d){
>>  $di = 'd' . "$i";
>>  $select .= "`$di`, "
>>  $MinD = $d - $r;
>>  $MaxD = $d + $r;
>>  $where .= "`$di` >= '$MinD' AND `$di` <= '$MaxD' AND ";
>> }
>> $select = substr($select, 0, -2);  //trim of the trailing comma and space
>> $where = substr($where, 0, -4);  //trim off the trailing 'AND '
>>
>> $query = "$select FROM `points` $where";
>>
>>     
> Thanks for the nice illustration. In this case with the proper indices
> this will indeed split the space in sections; nevertheless this approach
> has great difficulties returning an ordered list of distances and
> prefereably only the 100 closest ones at that.
>
> Wkr,
>
>   
Thread
How to deal with 96 Dimensional Points ?Werner Van Belle30 Mar
  • Re: How to deal with 96 Dimensional Points ?Chris W30 Mar
    • Re: How to deal with 96 Dimensional Points ?Geert-Jan Brits31 Mar
      • Re: How to deal with 96 Dimensional Points ?Werner Van Belle31 Mar
    • Re: How to deal with 96 Dimensional Points ?Werner Van Belle31 Mar
      • Re: How to deal with 96 Dimensional Points ?Chris W31 Mar
Re: How to deal with 96 Dimensional Points ?Werner Van Belle30 Mar
Re: How to deal with 96 Dimensional Points ?Werner Van Belle30 Mar