MySQL Lists are EOL. Please join:

 List: General Discussion « Previous MessageNext Message » From: Chris W Date: March 31 2010 5:22am Subject: Re: How to deal with 96 Dimensional Points ? View as plain text
Here is an idea, I'm not going to code this one:)  It's still not an
ideal solution because it has to make assumptions about your data set.
Execute the algorithm I outlined previously with a very small r value,
if you didn't find the number of points you are looking for, increase r
and modify the query slightly so it doesn't return any of the points the
first query returned .... something like AND `PointID` NOT in ('34',
'56', '67', . . .).
At every step along the way insert the point id of the points inside of
r along with the distance they are from the test point, once you have
over 100 records in this table stop increasing r and query the temp
table sorted by distance with a limit of 100.  Of course you have to
have some knowledge of your data set to get a reasonable start value for
r and a reasonable method for determining how much to increase it each time.

On the other hand a minor modification seems better.  By inserting all
the points in the cube along with their distance in the temp table, a
query like SELECT count(*) FROM temp WHERE `Distance` <= r Would be a
good way to see if you need to continue to the next round.  Also doing
it that way, instead of using the NOT IN syntax, which I understand can
be slow, you can modify the where condition to find points that are
inside the current cube of size r but are outside the previous cube.

Chris W

Werner Van Belle wrote:
> Hello Chris,
>
> The use case I'
> m talking about is actually a typical usecase for GIS applications: give
> me the x closest points to this one. E.g: give me the 10 points closest
> to (1,2,79) or in my case: give me the 100 points closest to
> (x1,....x96). A query like yours might be possible and might be a good
> solution if we would know the radius in which we are looking for the
> points, but this is not really the case: we merely want a list returned
> ordered by distance. Solving this with your solution is possible but is
> quite slow. There exists nice datastructures to deal with this type of
> problem as said and these are used in the GIS implementation in MySql.
>
> Chris W wrote:
>
>> I'm not sure why, but it seems that some people, I don't mean to imply
>> that you are one of them, think there is some magic MySQL can preform
>> to find points with in a given radius using the GIS extension.  There
>> is no magic.  They simply use the well known math required to
>> determine what points are inside the circle.
>>
> GIS extenstions are also not only about distances: the above query is
> better solved with specialized datastructures.
>
>> I could be wrong but I doubt there is any way to create an index that
>> can directly indicate points with a a certain distance of other points
>> unless that index included the distance from every point to every
>> other point.  That is obviously not practical since with a set of only
>> 14 points the index would have over 6 billion entries.
>>
> Partitioning of the space such as done in 3D render engines do solve
> this problem more efficiently than having a list of all pairtwise
> distances.  So the question is not whether such algorithms exist, it is
> rather whether they are available in/through MySql.
>
>
>> lets call each of your dimensions d1, d2, d3 .... d96. If you create
>> an index on d1, d2, .... d69, you can then create a simple query that
>> will quickly find all points that will find all points that are with
>> in a bounding box.  Since this query is going to get a bit large with
>> 96 dimensions, I would use code to create the query.  I will use php.
>> dimensions being in an array TestPointD[1] = x, TestPointD[2] = . . .
>>
>> \$select = 'SELECT `PointID`, ';
>> \$where = 'WHERE ';
>> foreach(\$TestPointD as \$i => \$d){
>>  \$di = 'd' . "\$i";
>>  \$select .= "`\$di`, "
>>  \$MinD = \$d - \$r;
>>  \$MaxD = \$d + \$r;
>>  \$where .= "`\$di` >= '\$MinD' AND `\$di` <= '\$MaxD' AND ";
>> }
>> \$select = substr(\$select, 0, -2);  //trim of the trailing comma and space
>> \$where = substr(\$where, 0, -4);  //trim off the trailing 'AND '
>>
>> \$query = "\$select FROM `points` \$where";
>>
>>
> Thanks for the nice illustration. In this case with the proper indices
> this will indeed split the space in sections; nevertheless this approach
> has great difficulties returning an ordered list of distances and
> prefereably only the 100 closest ones at that.
>
> Wkr,
>
>