From: Chris W
Date: March 31 2010 5:22am
Subject: Re: How to deal with 96 Dimensional Points ?
List-Archive: http://lists.mysql.com/mysql/221102
Message-Id: <4BB2DC21.9070608@cox.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Here is an idea, I'm not going to code this one:) It's still not an
ideal solution because it has to make assumptions about your data set.
Execute the algorithm I outlined previously with a very small r value,
if you didn't find the number of points you are looking for, increase r
and modify the query slightly so it doesn't return any of the points the
first query returned .... something like AND `PointID` NOT in ('34',
'56', '67', . . .).
At every step along the way insert the point id of the points inside of
r along with the distance they are from the test point, once you have
over 100 records in this table stop increasing r and query the temp
table sorted by distance with a limit of 100. Of course you have to
have some knowledge of your data set to get a reasonable start value for
r and a reasonable method for determining how much to increase it each time.
On the other hand a minor modification seems better. By inserting all
the points in the cube along with their distance in the temp table, a
query like SELECT count(*) FROM temp WHERE `Distance` <= r Would be a
good way to see if you need to continue to the next round. Also doing
it that way, instead of using the NOT IN syntax, which I understand can
be slow, you can modify the where condition to find points that are
inside the current cube of size r but are outside the previous cube.
Chris W
Werner Van Belle wrote:
> Hello Chris,
>
> The use case I'
> m talking about is actually a typical usecase for GIS applications: give
> me the x closest points to this one. E.g: give me the 10 points closest
> to (1,2,79) or in my case: give me the 100 points closest to
> (x1,....x96). A query like yours might be possible and might be a good
> solution if we would know the radius in which we are looking for the
> points, but this is not really the case: we merely want a list returned
> ordered by distance. Solving this with your solution is possible but is
> quite slow. There exists nice datastructures to deal with this type of
> problem as said and these are used in the GIS implementation in MySql.
>
> Chris W wrote:
>
>> I'm not sure why, but it seems that some people, I don't mean to imply
>> that you are one of them, think there is some magic MySQL can preform
>> to find points with in a given radius using the GIS extension. There
>> is no magic. They simply use the well known math required to
>> determine what points are inside the circle.
>>
> GIS extenstions are also not only about distances: the above query is
> better solved with specialized datastructures.
>
>> I could be wrong but I doubt there is any way to create an index that
>> can directly indicate points with a a certain distance of other points
>> unless that index included the distance from every point to every
>> other point. That is obviously not practical since with a set of only
>> 14 points the index would have over 6 billion entries.
>>
> Partitioning of the space such as done in 3D render engines do solve
> this problem more efficiently than having a list of all pairtwise
> distances. So the question is not whether such algorithms exist, it is
> rather whether they are available in/through MySql.
>
>
>> lets call each of your dimensions d1, d2, d3 .... d96. If you create
>> an index on d1, d2, .... d69, you can then create a simple query that
>> will quickly find all points that will find all points that are with
>> in a bounding box. Since this query is going to get a bit large with
>> 96 dimensions, I would use code to create the query. I will use php.
>> Let's start with the desired radius being r and the test point
>> dimensions being in an array TestPointD[1] = x, TestPointD[2] = . . .
>>
>> $select = 'SELECT `PointID`, ';
>> $where = 'WHERE ';
>> foreach($TestPointD as $i => $d){
>> $di = 'd' . "$i";
>> $select .= "`$di`, "
>> $MinD = $d - $r;
>> $MaxD = $d + $r;
>> $where .= "`$di` >= '$MinD' AND `$di` <= '$MaxD' AND ";
>> }
>> $select = substr($select, 0, -2); //trim of the trailing comma and space
>> $where = substr($where, 0, -4); //trim off the trailing 'AND '
>>
>> $query = "$select FROM `points` $where";
>>
>>
> Thanks for the nice illustration. In this case with the proper indices
> this will indeed split the space in sections; nevertheless this approach
> has great difficulties returning an ordered list of distances and
> prefereably only the 100 closest ones at that.
>
> Wkr,
>
>