"Siegfried Heintze" <siegfried@stripped> wrote on 07/24/2005 11:35:36
> I have a large number of job titles (40K). Each job title has multiple
> keywords making a one-to-many parent-child relationship.
> If I join job title with company name, address, company url, company
> job name, job location, job url (etc...) I have a mighty wide result set
> that will be repeated for each keyword.
> What I have done in the past (in a different, much smaller, application)
> perform a join of everything except the keyword and store everything in
> Then I iterate thru each wide row in the hashmap and perform a separate
> SELECT statement foreach row in this hashmap to fetch the multiple
> Whew! That would be a lot of RAM (and paging) for this application.
> Are there any other more efficient approaches?
There are two major classes of efficiency when dealing with any RDBMS:
time efficiency (faster results), space efficiency (stored data takes less
room on the disk). Which one are you worried about?
If it were me, I would start with all of the data normalized:
* a Companies table (name, address, url, city, etc)
* a Job Titles table (a list of names)
* a Keywords table (a list of words used to describe Job Titles)
* a JobPosting table ( Relates Companies to Job Titles. Should
also be used to track things like dateposted, dateclosed, salary offered,
* a Postings_Keywords table (matches a Posting to multiple
I would only denormalize if testing showed a dramatic improvement in
performance by doing so. I would think that the Job Title to Keyword
relationship would be different between Companies. One company posting for
a "Programmer" may want VB while another wants PHP and PERL. By
associating the Keywords with a Posting (and not just the Job Title), you
can make that list Company-specific.
Unimin Corporation - Spruce Pine