List:General Discussion« Previous MessageNext Message »
From:Jerry Schwartz Date:April 29 2011 5:09pm
Subject:RE: Join based upon LIKE
View as plain text  
>-----Original Message-----
>From: Johan De Meersman [mailto:vegivamp@stripped]
>Sent: Friday, April 29, 2011 5:56 AM
>To: Jerry Schwartz
>Cc: mysql mailing list
>Subject: Re: Join based upon LIKE
>
>
>----- Original Message -----
>> From: "Jerry Schwartz" <jerry@stripped>
>>
>> [JS] This isn't the only place I have to deal with fuzzy data. :-(
>> Discretion prohibits further comment.
>
>Heh. What you *really* need, is a LART. Preferably one of the spiked variety.
>
[JS] Unless a LART is a demon of some kind, I don't know what it is.

>> A full-text index would work if I were only looking for one title at
>> a time, but I don't know if that would be a good idea if I have a list of
>> 10000 titles. That would pretty much require either 10000 separate queries
>> or a very, very long WHERE clause.
>
>Yes, unfortunately. You should see if you can introduce a form of data
>normalisation - say, shadow fields with corrected entries, or functionality 
>in
>the application that suggests correct entries based on what the user typed.
>
[JS] Except for obvious misspellings and non-ASCII characters, I do not have 
the freedom to muck with the text. If the data were created in-house, I could 
correct it on the way in; but it comes from myriad other companies.

>Or, if the money's there, you could have a look at Amazon Mechanical Turk 
>(yes,
>really) for cheap-ish data correction.
>
[JS] Again, I can't change the data. The titles are assigned by the 
publishers. Think what would happen if Amazon decided to "fix" the titles of 
books. "Ain't Misbehavin" would, at best, turn into "I am not misbehaving".

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: jerry@stripped
Web site: www.the-infoshop.com



>--
>Bier met grenadyn
>Is als mosterd by den wyn
>Sy die't drinkt, is eene kwezel
>Hy die't drinkt, is ras een ezel



Thread
FW: Join based upon LIKEJerry Schwartz28 Apr
  • Re: Join based upon LIKEJohan De Meersman28 Apr
    • RE: Join based upon LIKEJerry Schwartz29 Apr
      • Re: Join based upon LIKEJohan De Meersman29 Apr
        • RE: Join based upon LIKEJerry Schwartz29 Apr
  • Re: FW: Join based upon LIKEhsv30 Apr
RE: Join based upon LIKEJerry Schwartz29 Apr
  • Re: Join based upon LIKEJohan De Meersman1 May
    • RE: Join based upon LIKEJerry Schwartz2 May
      • Re: Join based upon LIKEJohan De Meersman3 May
        • RE: Join based upon LIKEJerry Schwartz3 May
          • Re: Join based upon LIKEJohan De Meersman3 May
            • Re: Join based upon LIKEshawn wilson3 May
              • RE: Join based upon LIKEJerry Schwartz3 May
                • Re: Join based upon LIKENuno Tavares4 May
                  • RE: Join based upon LIKEJerry Schwartz5 May
Re: FW: Join based upon LIKEhsv30 Apr