http://www.gedpage.com/soundex.html offers a simple explanation of what it does.
One possibility would be building a referential table with only a recordID and soundex
column, unique over both; and filling that with the soundex of individual nonjunk words.
So, from the titles
1 | Rain in Spain
2 | Spain's Rain
you'd get
1 | R500
1 | S150
2 | S150
2 | R500
From thereon, you can see that all the same words have been used - ignoring a lot of
spelling errors like Spian. Obviously not a magic solution, but it's a start.
----- Original Message -----
> From: "Jerry Schwartz" <jerry@stripped>
> To: "Johan De Meersman" <vegivamp@stripped>
> Cc: "Jim McNeely" <jim@stripped>, "mysql mailing list"
> <mysql@stripped>
> Sent: Monday, 2 May, 2011 4:09:36 PM
> Subject: RE: Join based upon LIKE
>
> [JS] I've thought about using soundex(), but I'm not quite sure how.
>
> I didn't pursue it much because there are so many odd terms such as
> chemical
> names, but perhaps I should give it a try in my infinite free time.
>
>
> [JS] Thanks for your condolences.
>
> Regards,
>
> Jerry Schwartz
> Global Information Incorporated
> 195 Farmington Ave.
> Farmington, CT 06032
>
> 860.674.8796 / FAX: 860.674.8341
> E-mail: jerry@stripped
> Web site: www.the-infoshop.com
>
--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel