List:General Discussion« Previous MessageNext Message »
From:SGreen Date:January 6 2005 3:05pm
Subject:Re: Retrieving partial field values
View as plain text  
After hearing both sides being suggested, I think I would recommend the 
more normalized approach of having one or more separate tables for the 
names (one name per row) and a mapping table that associates a name field 
(mother's maiden name, decedent's given name, decedent's surname, etc.) on 
a record (marriage, birth, death, etc.) to the possible names the 
transcriptionist thinks were written on the record. A simple index would 
very quickly find all full matches and partial beginning matches. With 
mid-word or word-ending matches**, the engine will still need to scan the 
entire list of names (not every name in the database, just the unique list 
of spellings which will be much smaller) .

This (the multi-table design) may add a little complexity to your GUI code 
but I think that the performance improvement on the searches will more 
than offset a little extra effort added to the front end and data entry 
portions of your process.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

** You could optimize word-ending searches by storing a copy of the name 
reversed. Then a regular index could also be used to quickly find the end 
of a name. 

Lets say you want to find all of the names that end in "esh"... You  have 
names like "Abesh", "Janesh", and "Phresh". If you added another column 
(indexed of course) to your names table that stored the names as "hsebA", 
"hsenaJ", and "hserhP" a search using LIKE 'hse%' would use that index 
where a search using LIKE '%esh' could not.

The trade off is that you nearly double the storage size of your names 
tables (one extra column plus its index) but you seriously improve search 
performance for an entire class of queries (names ending with ....).

"Jigal van Hemert" <jigal@stripped> wrote on 01/06/2005 09:30:08 AM:

> From: "Kentucky Families"
> 
> > A full-text search won't work. Too many records -- at least 8 million.
> From what I've read, full-text won't pick up the individual initials or 
very
> short names, like Vu, Lee, Doe, etc.
> 
> You can set the ft_min_word_len in an option file in the [mysqld] 
section to
> change the minimum word length (rebuild the full text indexes 
afterwards!)
> 
> Maybe alter the data a bit?
> If you store only firstnames you can search for J* to pick up an 
initial.
> If you only know initials simply use Jxzx as a first name (and let your
> application filter out the xzx part)
> If a name is shorter than four letters add xzx to the name and filter 
out in
> the application that shows the data.
> 
> Regards, Jigal.
> 
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql?unsub=1
> 

Thread
Retrieving partial field valuesKentucky Families5 Jan
  • Re: Retrieving partial field valuesStephen Moretti5 Jan
    • Re: Retrieving partial field valuesIan Grant5 Jan
      • Re: Retrieving partial field valuesStephen Moretti5 Jan
      • Re: Retrieving partial field valuesKentucky Families5 Jan
        • Re: Retrieving partial field valuesJigal van Hemert6 Jan
          • Re: Retrieving partial field valuesKentucky Families6 Jan
            • Re: Retrieving partial field valuesJigal van Hemert6 Jan
            • Re: Retrieving partial field valuesJigal van Hemert6 Jan
              • Re: Retrieving partial field valuesSGreen6 Jan
            • RE: Retrieving partial field valuesAndy Eastham6 Jan
        • RE: Retrieving partial field valuesSteve Bacher6 Jan
        • Re: Retrieving partial field valuesIan Grant6 Jan
    • Re: Retrieving partial field valuesSGreen5 Jan