From: Chris W Date: March 31 2010 5:22am Subject: Re: How to deal with 96 Dimensional Points ? List-Archive: http://lists.mysql.com/mysql/221102 Message-Id: <4BB2DC21.9070608@cox.net> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Here is an idea, I'm not going to code this one:) It's still not an ideal solution because it has to make assumptions about your data set. Execute the algorithm I outlined previously with a very small r value, if you didn't find the number of points you are looking for, increase r and modify the query slightly so it doesn't return any of the points the first query returned .... something like AND `PointID` NOT in ('34', '56', '67', . . .). At every step along the way insert the point id of the points inside of r along with the distance they are from the test point, once you have over 100 records in this table stop increasing r and query the temp table sorted by distance with a limit of 100. Of course you have to have some knowledge of your data set to get a reasonable start value for r and a reasonable method for determining how much to increase it each time. On the other hand a minor modification seems better. By inserting all the points in the cube along with their distance in the temp table, a query like SELECT count(*) FROM temp WHERE `Distance` <= r Would be a good way to see if you need to continue to the next round. Also doing it that way, instead of using the NOT IN syntax, which I understand can be slow, you can modify the where condition to find points that are inside the current cube of size r but are outside the previous cube. Chris W Werner Van Belle wrote: > Hello Chris, > > The use case I' > m talking about is actually a typical usecase for GIS applications: give > me the x closest points to this one. E.g: give me the 10 points closest > to (1,2,79) or in my case: give me the 100 points closest to > (x1,....x96). A query like yours might be possible and might be a good > solution if we would know the radius in which we are looking for the > points, but this is not really the case: we merely want a list returned > ordered by distance. Solving this with your solution is possible but is > quite slow. There exists nice datastructures to deal with this type of > problem as said and these are used in the GIS implementation in MySql. > > Chris W wrote: > >> I'm not sure why, but it seems that some people, I don't mean to imply >> that you are one of them, think there is some magic MySQL can preform >> to find points with in a given radius using the GIS extension. There >> is no magic. They simply use the well known math required to >> determine what points are inside the circle. >> > GIS extenstions are also not only about distances: the above query is > better solved with specialized datastructures. > >> I could be wrong but I doubt there is any way to create an index that >> can directly indicate points with a a certain distance of other points >> unless that index included the distance from every point to every >> other point. That is obviously not practical since with a set of only >> 14 points the index would have over 6 billion entries. >> > Partitioning of the space such as done in 3D render engines do solve > this problem more efficiently than having a list of all pairtwise > distances. So the question is not whether such algorithms exist, it is > rather whether they are available in/through MySql. > > >> lets call each of your dimensions d1, d2, d3 .... d96. If you create >> an index on d1, d2, .... d69, you can then create a simple query that >> will quickly find all points that will find all points that are with >> in a bounding box. Since this query is going to get a bit large with >> 96 dimensions, I would use code to create the query. I will use php. >> Let's start with the desired radius being r and the test point >> dimensions being in an array TestPointD[1] = x, TestPointD[2] = . . . >> >> $select = 'SELECT `PointID`, '; >> $where = 'WHERE '; >> foreach($TestPointD as $i => $d){ >> $di = 'd' . "$i"; >> $select .= "`$di`, " >> $MinD = $d - $r; >> $MaxD = $d + $r; >> $where .= "`$di` >= '$MinD' AND `$di` <= '$MaxD' AND "; >> } >> $select = substr($select, 0, -2); //trim of the trailing comma and space >> $where = substr($where, 0, -4); //trim off the trailing 'AND ' >> >> $query = "$select FROM `points` $where"; >> >> > Thanks for the nice illustration. In this case with the proper indices > this will indeed split the space in sections; nevertheless this approach > has great difficulties returning an ordered list of distances and > prefereably only the 100 closest ones at that. > > Wkr, > >