http://www.gedpage.com/soundex.html offers a simple explanation of what it does.
One possibility would be building a referential table with only a recordID and soundex
column, unique over both; and filling that with the soundex of individual nonjunk words.
So, from the titles
1 | Rain in Spain
2 | Spain's Rain
1 | R500
1 | S150
2 | S150
2 | R500
From thereon, you can see that all the same words have been used - ignoring a lot of
spelling errors like Spian. Obviously not a magic solution, but it's a start.
----- Original Message -----
> From: "Jerry Schwartz" <jerry@stripped>
> To: "Johan De Meersman" <vegivamp@stripped>
> Cc: "Jim McNeely" <jim@stripped>, "mysql mailing list"
> Sent: Monday, 2 May, 2011 4:09:36 PM
> Subject: RE: Join based upon LIKE
> [JS] I've thought about using soundex(), but I'm not quite sure how.
> I didn't pursue it much because there are so many odd terms such as
> names, but perhaps I should give it a try in my infinite free time.
> [JS] Thanks for your condolences.
> Jerry Schwartz
> Global Information Incorporated
> 195 Farmington Ave.
> Farmington, CT 06032
> 860.674.8796 / FAX: 860.674.8341
> E-mail: jerry@stripped
> Web site: www.the-infoshop.com
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel