I'm just "old school" but I've always thought that you should
>choose a primary key based on data that is actually in the table
>whenever possible, rather than generating a new value out of thin
Mebbe every db list should drag this out for re-examination once a year
or so :-) .
From the principle that the smaller the opportunity there is for
violation of PK
uniqueness, and from the fact that any real-world data, being
empirical, has error
bars, I conclude that in many cases the most robust PK is INT
a longer version of this argument (ie putting the cat amongst the
pigeons) click on
"Practical database design rules" at
Happy holidays to one and all.
technique that I see a lot on this mailing list is people putting
auto-incremented integer primary keys on their tables.
Maybe I'm just "old school" but I've always thought that you should
choose a primary key based on data that is actually in the table
whenever possible, rather than generating a new value out of thin air.
The only exception that comes to mind is things like ID numbers; for
example, it is better to use an internally-generated integer for an
employee number than it is to use an employee's name. Even the
combination of first name and last name is not necessarily unique - I
could cite a real life example -and, of course, people can change their
names. That makes names less desireable than a generated value when you
are trying to uniquely indentify such entities. In such a case, a nice,
reasonable short integer is easier.
I just found this rather good definition of primary keys at
The relevant bit says that a primary key must have:
- a non-null value for each instance of the entity
- a value that is unique for each instance of an entity
- a value that must not change or become null during the life of the
each instance of the entity
That article makes the same basic remarks about name vs. ID but makes
the point that it is more commonly the case that table designers will
use something like a social security number - an _externally_ generated
number - to distinguish between employees rather than an
But the trend in this mailing list is toward using generated values as
primary keys in virtually EVERY table, even when good primary keys can
be found in the (non-generated) data already existing in the table.
Now, I haven't done anything remotely resembling a quantified analysis
so maybe I'm wildly exaggerating this trend. But I do seem to recall a
lot of table descriptions with auto-generated keys and I don't think
they were all a name vs. ID scenario....
Has anyone else noticed a similar trend?
If this trend is real, it doesn't seem like a very good trend to me.
For example, if you were keeping track of parts in a warehouse, why
would anyone make a table that looked like this:
ID (autogenerated PK) PART_NO PART_DESCRIPTION
1 A01 Widget
2 B03 Grapple Grommet
3 A02 Snow Shovel
4 D11 Whisk
5 C04 Duct Tape
when this table is simpler:
PART_NO (PK) PART_DESCRIPTION
B03 Grapple Grommet
A02 Snow Shovel
C04 Duct Tape
Would anyone care to convince me that the first version of the table is
"better" than the second version in some way?
I just want to be sure that no one has come along with some new and
compelling reason to autogenerate keys when perfectly good keys can be
found within the data already. I don't mind being "old school" but I
don't want to be "out to lunch" :-)