List:General Discussion« Previous MessageNext Message »
From:Johan De Meersman Date:May 3 2011 2:59pm
Subject:Re: Join based upon LIKE
View as plain text  
----- Original Message -----
> From: "Jerry Schwartz" <jerry@stripped>
> 
> I'm not sure that I could easily build a dictionary of non-junk
> words, since

The traditional way is to build a database of junk words. The list tends to be shorter :-)

Think and/or/it/the/with/like/...

Percentages of mutual and non-mutual words between two titles should be a reasonable
indicator of likeness. You could conceivably even assign value to individual words, so
"polypropylbutanate" is more useful than "synergy" for comparison purposes.

All very theoretical, though, I haven't actually done much of it to this level. My
experience in data mangling is limited to mostly should-be-fixed-format data like sports
results.


-- 
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel
Thread
FW: Join based upon LIKEJerry Schwartz28 Apr
  • Re: Join based upon LIKEJohan De Meersman28 Apr
    • RE: Join based upon LIKEJerry Schwartz29 Apr
      • Re: Join based upon LIKEJohan De Meersman29 Apr
        • RE: Join based upon LIKEJerry Schwartz29 Apr
  • Re: FW: Join based upon LIKEhsv30 Apr
RE: Join based upon LIKEJerry Schwartz29 Apr
  • Re: Join based upon LIKEJohan De Meersman1 May
    • RE: Join based upon LIKEJerry Schwartz2 May
      • Re: Join based upon LIKEJohan De Meersman3 May
        • RE: Join based upon LIKEJerry Schwartz3 May
          • Re: Join based upon LIKEJohan De Meersman3 May
            • Re: Join based upon LIKEshawn wilson3 May
              • RE: Join based upon LIKEJerry Schwartz3 May
                • Re: Join based upon LIKENuno Tavares4 May
                  • RE: Join based upon LIKEJerry Schwartz5 May
Re: FW: Join based upon LIKEhsv30 Apr