List: | General Discussion | « Previous MessageNext Message » | |

From: | Werner Van Belle | Date: | March 31 2010 5:05am |

Subject: | Re: How to deal with 96 Dimensional Points ? | ||

View as plain text |

Geert-Jan Brits wrote: > Perhaps you could give us a (generalized) description of your use-case, so > we can better grasp what you want to achieve, and how you want to use it. > i.e: since I can't imagine/ envison a real 'eucledian distance' over 96 > dimensions I bet you're talking a generalized distance function over N > dimenions. > This is usually only used in two general ways afaik: > 2 calculating an ordered top-M list of closests points to the > target point (Chris' implementation slightly altered) This is indeed the situation. A small alteration to chris his implementation won't do, since we do not know with radius to start with, so it is not just a matter of adapting the post-filtering. > 3 (hmm maybe three: > clustering points based on their distance to eachother) > Yes, this is part of the usecase, but at the moment not my main focus. A statistical approach will need to employed for that, without going for full aggregation. > It helps if we know what you'r after. For instance: if you're points don't > change often and you want to achieve case 1 or 2 I would calculate these > once and all-at-once and save them in a seperate table, bc. the on-demand > variant may quickly become too slow. again depending on your case: option A. > {<pointid, {neighborids}>} <-- list of neigborids per point id, with > pointid as key. > option B {<pointid, neighborid} <-- one neighborid per point id, with > pointid + neighborid as key. > That is not an option. Every 2 minutes or so the next point is randomly choosen and we need a collection of points in the neighboorhood. > perhaps also helpful foor google etc.: - a distance function if more often > called a similarity function > - top-n 'points' for a given point are usually called its neighbors. - in > most cases you don't have to take the sqrt in Chris' implementation which > can save a lot (but instead do: if($SumSq <=($r*$r)){//code here} > Indeed, but this is only a fraction of the time. The larger problem lies in searching all points that have potential. An idea that might work is to modify the radius of what we are looking at while we are searching based on the maximum radius we have so far and cut down distance comparisons if they will surely fall outside the current N closest neighbours. -- http://werner.yellowcouch.org/ Attachment: [application/pgp-signature] OpenPGP digital signature signature.asc

Thread | ||
---|---|---|

• How to deal with 96 Dimensional Points ? | Werner Van Belle | 30 Mar |

• Re: How to deal with 96 Dimensional Points ? | Chris W | 30 Mar |

• Re: How to deal with 96 Dimensional Points ? | Geert-Jan Brits | 31 Mar |

• Re: How to deal with 96 Dimensional Points ? | Werner Van Belle | 31 Mar |

• Re: How to deal with 96 Dimensional Points ? | Werner Van Belle | 31 Mar |

• Re: How to deal with 96 Dimensional Points ? | Chris W | 31 Mar |

• Re: How to deal with 96 Dimensional Points ? | Werner Van Belle | 30 Mar |

• Re: How to deal with 96 Dimensional Points ? | Werner Van Belle | 30 Mar |