"Jigal van Hemert" <jigal@stripped> wrote on 04/26/2005 10:35:06 AM:
> From: "Harald Fuchs"
> > > id INT(11) - accountID
> > > name VARCHAR(32) - parameter name
> > > value INT(11) - parameter value
> > > Other tables contain string, datetime, etc. parameters.
> > > Since most searches are made for a value (or range) of one or more
> > > parameters, a usable primary key is:
> > > name-value-id
> > That's a horrible denormalization. If one named parameter can hold
> > only one INT value for one account id, then (id, name) could be a
> > primary key; otherwise, you'd need a surrogate primary key.
> And what if you want to find the IDs which have a certain value in a
> parameter? Then you're searching for name and value and only need the id
> the glue for joins.
> The sitution is that I have to store a variable (and changing) number of
> parameters for a large number of IDs. The traditional construction of a
> column for each parameter is not usable anymore:
> - modifying the table structure for new parameters requires a lot of
> - all unused parameters will still take space
> - design of index(es) is virtually impossible because searches are made
> various combinations of parameters
> The parameter table solution is rather simple, but has some performance
> complications once you have 26,000,000 records for 475,000 accounts in a
> 5,3GB database...
> How would *you* normalize this situation? All suggestions are welcome!
> Regards, Jigal.
I think what tripped us up (and raised the red flags) was that you called
your index on (id, name, value) a PRIMARY KEY. However, the unique
combination of values on your parameters table should have only been for
(id, name), right? The id-name combination will uniquely identify a
parameter for an account (and each account cannot have the same parameter
more than once, can it?). So your PK should be on (id, name) and a regular
index could be placed on (id, name, value) as a covering index.
However, your covering index will be a complete duplicate of your table
and based on the numbers you give us, it probably won't fit into memory.
You would end up trying to use paged memory to search an index to get at
the value of the 'value' column just to avoid an extra trip to the disk to
get the value based on an PK match. I don't think the extra effort (and
disk space used and working memory consumed) would actually help your
retrieval speed. If it were me, I would not index that table beyond the PK
on just (id, name). But, I would still test it both ways just to be sure.
I had a similar situation as you and chose a similar solution. My data
storage challenge was to make millions of laboratory reports accessible
for statistical analysis. Each report can contain any number of analyses
on them (from 2 to 60) depending on what tests were requested when the
sample was submitted to the lab. I also created a table for reports and a
separate table for each test-result pair (fortunately for me, all of my
results were numeric or could be represented with numbers). The two tables
are linked through a FK (I am also using InnoDB). Not only is this saving
space but the design is flexible because I do not need to change my tables
every time the lab comes up with a new analysis it can perform (for the
exact same reasons you described above).
Don't get discouraged, OK? ;-)
Unimin Corporation - Spruce Pine