From: Werner Van Belle Date: March 31 2010 4:58am Subject: Re: How to deal with 96 Dimensional Points ? List-Archive: http://lists.mysql.com/mysql/221100 Message-Id: <4BB2D68C.6090808@yellowcouch.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigE135D76F1B1C9B391221AC18" --------------enigE135D76F1B1C9B391221AC18 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hello Chris, The use case I' m talking about is actually a typical usecase for GIS applications: give me the x closest points to this one. E.g: give me the 10 points closest to (1,2,79) or in my case: give me the 100 points closest to (x1,....x96). A query like yours might be possible and might be a good solution if we would know the radius in which we are looking for the points, but this is not really the case: we merely want a list returned ordered by distance. Solving this with your solution is possible but is quite slow. There exists nice datastructures to deal with this type of problem as said and these are used in the GIS implementation in MySql. Chris W wrote: > I'm not sure why, but it seems that some people, I don't mean to imply > that you are one of them, think there is some magic MySQL can preform > to find points with in a given radius using the GIS extension. There > is no magic. They simply use the well known math required to > determine what points are inside the circle. GIS extenstions are also not only about distances: the above query is better solved with specialized datastructures. > I could be wrong but I doubt there is any way to create an index that > can directly indicate points with a a certain distance of other points > unless that index included the distance from every point to every > other point. That is obviously not practical since with a set of only > 14 points the index would have over 6 billion entries. Partitioning of the space such as done in 3D render engines do solve this problem more efficiently than having a list of all pairtwise distances. So the question is not whether such algorithms exist, it is rather whether they are available in/through MySql. > lets call each of your dimensions d1, d2, d3 .... d96. If you create > an index on d1, d2, .... d69, you can then create a simple query that > will quickly find all points that will find all points that are with > in a bounding box. Since this query is going to get a bit large with > 96 dimensions, I would use code to create the query. I will use php.=20 > Let's start with the desired radius being r and the test point > dimensions being in an array TestPointD[1] =3D x, TestPointD[2] =3D . .= . > > \$select =3D 'SELECT `PointID`, '; > \$where =3D 'WHERE '; > foreach(\$TestPointD as \$i =3D> \$d){ > \$di =3D 'd' . "\$i"; > \$select .=3D "`\$di`, " > \$MinD =3D \$d - \$r; > \$MaxD =3D \$d + \$r; > \$where .=3D "`\$di` >=3D '\$MinD' AND `\$di` <=3D '\$MaxD' AND "; > } > \$select =3D substr(\$select, 0, -2); //trim of the trailing comma and s= pace > \$where =3D substr(\$where, 0, -4); //trim off the trailing 'AND ' > > \$query =3D "\$select FROM `points` \$where"; > Thanks for the nice illustration. In this case with the proper indices this will indeed split the space in sections; nevertheless this approach has great difficulties returning an ordered list of distances and prefereably only the 100 closest ones at that. Wkr, --=20 http://werner.yellowcouch.org/ --------------enigE135D76F1B1C9B391221AC18 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkuy1o8ACgkQ2TDF0fTmy3WyyACfbnJ5Zm9DnHmC/RgkOoxsSO3M F70AnipAMM23zFmDNL6knaBO9SsTxUbQ =sDFY -----END PGP SIGNATURE----- --------------enigE135D76F1B1C9B391221AC18--