List:General Discussion« Previous MessageNext Message »
From:shawn wilson Date:May 3 2011 8:08pm
Subject:Re: Join based upon LIKE
View as plain text  
I'm actually enjoying this discussion because I have the same type of issue.
However, I have done away with trying to do a full text search in favor of
making a table with unique fields where all fields should uniquely identify
the group. If I get a dupe, I can clean it up.

However, like you, they don't want me to mess with the original data. So,
what I have is another table with my good data that my table with my unique
data refers to. If a bad record is creased, I don't care I just create my
relationship to the table of data I know (read think - I rarely look at this
stuff) is good.

So, I have 4 fields that should be unique for a group. Two chats and two
ints. If three of these match a record in the 'good data' table - there's my
relationship. If two or less match, I create a new record in my 'good data'
table and log the event. (I haven't gotten to the logging part yet though,
easy enough just to look sense none of the fields in 'good data' should
match)

I'm thinking you might have to dig deeper than me to find 'good data' but I
think its there. Maybe isbn, name, publisher + address, price, average
pages, name of sales person, who you guys pay for the material, etc etc etc.


On May 3, 2011 10:59 AM, "Johan De Meersman" <vegivamp@stripped> wrote:
>
>
> ----- Original Message -----
> > From: "Jerry Schwartz" <jerry@stripped>
> >
> > I'm not sure that I could easily build a dictionary of non-junk
> > words, since
>
> The traditional way is to build a database of junk words. The list tends
to be shorter :-)
>
> Think and/or/it/the/with/like/...
>
> Percentages of mutual and non-mutual words between two titles should be a
reasonable indicator of likeness. You could conceivably even assign value to
individual words, so "polypropylbutanate" is more useful than "synergy" for
comparison purposes.
>
> All very theoretical, though, I haven't actually done much of it to this
level. My experience in data mangling is limited to mostly
should-be-fixed-format data like sports results.
>
>
> --
> Bier met grenadyn
> Is als mosterd by den wyn
> Sy die't drinkt, is eene kwezel
> Hy die't drinkt, is ras een ezel
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql?unsub=1
>

Thread
FW: Join based upon LIKEJerry Schwartz28 Apr
  • Re: Join based upon LIKEJohan De Meersman28 Apr
    • RE: Join based upon LIKEJerry Schwartz29 Apr
      • Re: Join based upon LIKEJohan De Meersman29 Apr
        • RE: Join based upon LIKEJerry Schwartz29 Apr
  • Re: FW: Join based upon LIKEhsv30 Apr
RE: Join based upon LIKEJerry Schwartz29 Apr
  • Re: Join based upon LIKEJohan De Meersman1 May
    • RE: Join based upon LIKEJerry Schwartz2 May
      • Re: Join based upon LIKEJohan De Meersman3 May
        • RE: Join based upon LIKEJerry Schwartz3 May
          • Re: Join based upon LIKEJohan De Meersman3 May
            • Re: Join based upon LIKEshawn wilson3 May
              • RE: Join based upon LIKEJerry Schwartz3 May
                • Re: Join based upon LIKENuno Tavares4 May
                  • RE: Join based upon LIKEJerry Schwartz5 May
Re: FW: Join based upon LIKEhsv30 Apr