List:General Discussion« Previous MessageNext Message »
From:Michael Stassen Date:August 17 2004 9:03pm
Subject:Re: Assistance with SQL syntax: pulling duplicates back
View as plain text  
Eve,

Best to keep threads on the list.  Others may have better ideas, and future 
readers may benefit.

The comparison

   candidate.Location IN ('CA', 'California')

will match 'CA' and 'California', but will not match 'Cupertino, CA' because 
it isn't either of those strings.  To match that row as well, you would need 
to use LIKE and a wildcard (or RLIKE) instead.  Something like:

   candidate.Location LIKE '%CA' OR candidate.Location LIKE '%California'

Unfortunately, the index can't be used then because of the wildcard.  Mysql 
will have to look at every row.  The more rows you have, the worse the 
impact will be.  That may be OK if the rest of your WHERE criteria 
sufficiently pare down the number of rows first, but you'd have to test to 
be sure.

Even then, this method will generally yield incorrect matches.  For example, 
consider

   candidate.Location LIKE '%NE' OR candidate.Location LIKE '%Nebraska'

That would match 'Bangor, Maine' because it ends with 'ne'.  Or how about

   candidate.Location LIKE '%IA' OR candidate.Location LIKE '%Iowa'

That would match 'California' because it ends with 'ia'.  See the problem? 
We could reduce these by making the comparisons case-sensitive with the 
BINARY keyword:

   candidate.Location LIKE BINARY '%IA'
   OR candidate.Location LIKE '%Iowa'

That would no longer match 'California', but it would still match 'CALIFORNIA'.

Finally, consider that

     candidate.Location LIKE BINARY '%CA'
     OR candidate.Location LIKE '%California'

will not match 'Pasadena, California, USA'.

I doubt it's what you wanted to hear, but the problem is that the Location 
column is poorly designed.  It contains the answers to different questions. 
  That is, multiple/different kinds of data are crammed into one column. 
The only sure-fire way to perform searches by state is to have a state 
column.  You need to fix the db, and its data.  If you really cannot fix the 
db and data, you will have to live with slow queries and imperfect results.

Michael

Eve Atley wrote:

> Thanks for helping me out, Michael! I've learned *several* things today.
> 
> I have ended up using:
> candidate.Location IN ('CA', 'California')
> 
> However, you're correct: candidate.Location can contain more than just the
> state, at times. It was the way the database was previously designed,
> unfortunately. It can include:
> 
> 'Cupertino, CA'
> or
> 'CA'
> or
> 'California'
> 
> Will this still work for what I require?
> 
> 
> 

Thread
GUI for MySQLKirti S. Bajwa12 Aug
  • AW: GUI for MySQLFreddie Sorensen12 Aug
  • Re: GUI for MySQLKaram Chand13 Aug
  • Re: GUI for MySQLMartijn Tonies13 Aug
    • Full text for keyword(s) search?Eve Atley17 Aug
RE: GUI for MySQLGuillermo Chinchilla Zúñiga12 Aug
RE: Full text for keyword(s) search?Victor Pendleton17 Aug
  • Assistance with SQL syntax: pulling duplicates backEve Atley17 Aug
    • Re: Assistance with SQL syntax: pulling duplicates backMichael Stassen17 Aug
      • Re: Assistance with SQL syntax: pulling duplicates backMichael Stassen17 Aug
    • Re: Assistance with SQL syntax: pulling duplicates backSGreen17 Aug
RE: Assistance with SQL syntax: pulling duplicates backVictor Pendleton17 Aug
Re: Assistance with SQL syntax: pulling duplicates backMichael Stassen17 Aug