From: Werner Van Belle
Date: March 31 2010 5:05am
Subject: Re: How to deal with 96 Dimensional Points ?
List-Archive: http://lists.mysql.com/mysql/221101
Message-Id: <4BB2D821.4070106@yellowcouch.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
protocol="application/pgp-signature";
boundary="------------enigD424F525627623BE0E306EEF"
--------------enigD424F525627623BE0E306EEF
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Geert-Jan Brits wrote:
> Perhaps you could give us a (generalized) description of your use-case,=
so
> we can better grasp what you want to achieve, and how you want to use i=
t.
> i.e: since I can't imagine/ envison a real 'eucledian distance' over 96=
> dimensions I bet you're talking a generalized distance function over N
> dimenions.
> This is usually only used in two general ways afaik:=20
> 2 calculating an ordered top-M list of closests points to the
> target point (Chris' implementation slightly altered)=20
This is indeed the situation. A small alteration to chris his
implementation won't do, since we do not know with radius to start with,
so it is not just a matter of adapting the post-filtering.
> 3 (hmm maybe three:
> clustering points based on their distance to eachother)
> =20
Yes, this is part of the usecase, but at the moment not my main focus. A
statistical approach will need to employed for that, without going for
full aggregation.
> It helps if we know what you'r after. For instance: if you're points do=
n't
> change often and you want to achieve case 1 or 2 I would calculate the=
se
> once and all-at-once and save them in a seperate table, bc. the on-dema=
nd
> variant may quickly become too slow. again depending on your case: opti=
on A.
> {} <-- list of neigborids per point id, with
> pointid as key.
> option B { pointid + neighborid as key.
> =20
That is not an option. Every 2 minutes or so the next point is randomly
choosen and we need a collection of points in the neighboorhood.
> perhaps also helpful foor google etc.: - a distance function if more of=
ten
> called a similarity function
> - top-n 'points' for a given point are usually called its neighbors. - =
in
> most cases you don't have to take the sqrt in Chris' implementation whi=
ch
> can save a lot (but instead do: if($SumSq <=3D($r*$r)){//code here}
> =20
Indeed, but this is only a fraction of the time. The larger problem lies
in searching all points that have potential. An idea that might work is
to modify the radius of what we are looking at while we are searching
based on the maximum radius we have so far and cut down distance
comparisons if they will surely fall outside the current N closest
neighbours.
--=20
http://werner.yellowcouch.org/
--------------enigD424F525627623BE0E306EEF
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkuy2CEACgkQ2TDF0fTmy3WmhQCfU2RGNQ99qaGEUljny9D63oR8
82wAoIPdW7DFkqLGHR2OOf9cYfcHxbr4
=aGDj
-----END PGP SIGNATURE-----
--------------enigD424F525627623BE0E306EEF--