List: General Discussion « Previous MessageNext Message » From: Werner Van Belle Date: March 31 2010 5:05am Subject: Re: How to deal with 96 Dimensional Points ? View as plain text
```Geert-Jan Brits wrote:
> Perhaps you could give us a (generalized) description of your use-case, so
> we can better grasp what you want to achieve, and how you want to use it.
> i.e: since I can't imagine/ envison a real 'eucledian distance' over 96
> dimensions I bet you're talking a generalized distance function over N
> dimenions.
> This is usually only used in two general ways afaik:
> 2 calculating an ordered top-M list of closests points to the
> target point (Chris' implementation slightly altered)
This is indeed the situation. A small alteration to chris his
so it is not just a matter of adapting the post-filtering.

> 3 (hmm maybe three:
> clustering points based on their distance to eachother)
>
Yes, this is part of the usecase, but at the moment not my main focus. A
statistical approach will need to employed for that, without going for
full aggregation.
> It helps if we know what you'r after. For instance: if you're points don't
> change often and you want to achieve case 1 or 2  I would calculate these
> once and all-at-once and save them in a seperate table, bc. the on-demand
> variant may quickly become too slow. again depending on your case: option A.
> {<pointid, {neighborids}>}   <-- list of neigborids per point id, with
> pointid as key.
> option B {<pointid, neighborid} <-- one neighborid per  point id, with
> pointid + neighborid as key.
>
That is not an option. Every 2 minutes or so the next point is randomly
choosen and we need a collection of points in the neighboorhood.
> perhaps also helpful foor google etc.: - a distance function if more often
> called a similarity function
> - top-n 'points' for a given point are usually called its neighbors. - in
> most cases you don't have to take the sqrt in Chris' implementation which
> can save a lot (but instead  do:  if(\$SumSq <=(\$r*\$r)){//code here}
>
Indeed, but this is only a fraction of the time. The larger problem lies
in searching all points that have potential. An idea that might work is
to modify the radius of what we are looking at while we are searching
based on the maximum radius we have so far and cut down distance
comparisons if they will surely fall outside the current N closest
neighbours.

--
http://werner.yellowcouch.org/

Attachment: [application/pgp-signature] OpenPGP digital signature signature.asc
```