Hi Rhino,
/>Maybe I'm just "old school" but I've always thought that you should
>choose a primary key based on data that is actually in the table
>whenever possible, rather than generating a new value out of thin air. /
Mebbe every db list should drag this out for re-examination once a year
or so :-) .
From the principle that the smaller the opportunity there is for
violation of PK
uniqueness, and from the fact that any real-world data, being empirical,
has error
bars, I conclude that in many cases the most robust PK is INT
auto_increment. For
a longer version of this argument (ie putting the cat amongst the
pigeons) click on
"Practical database design rules" at
http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch01.pdf.
Happy holidays to one and all.
PB
-----
Rhino wrote:
> One technique that I see a lot on this mailing list is people putting
> auto-incremented integer primary keys on their tables.
>
> Maybe I'm just "old school" but I've always thought that you should
> choose a primary key based on data that is actually in the table
> whenever possible, rather than generating a new value out of thin air.
>
> The only exception that comes to mind is things like ID numbers; for
> example, it is better to use an internally-generated integer for an
> employee number than it is to use an employee's name. Even the
> combination of first name and last name is not necessarily unique - I
> could cite a real life example -and, of course, people can change
> their names. That makes names less desireable than a generated value
> when you are trying to uniquely indentify such entities. In such a
> case, a nice, reasonable short integer is easier.
>
> I just found this rather good definition of primary keys at
> http://www.utexas.edu/its/windows/database/datamodeling/dm/keys.html.
> The relevant bit says that a primary key must have:
> - a non-null value for each instance of the entity
> - a value that is unique for each instance of an entity
> - a value that must not change or become null during the life of the
> each instance of the entity
>
> That article makes the same basic remarks about name vs. ID but makes
> the point that it is more commonly the case that table designers will
> use something like a social security number - an _externally_
> generated number - to distinguish between employees rather than an
> internally-generated number.
>
> But the trend in this mailing list is toward using generated values as
> primary keys in virtually EVERY table, even when good primary keys can
> be found in the (non-generated) data already existing in the table.
>
> Now, I haven't done anything remotely resembling a quantified analysis
> so maybe I'm wildly exaggerating this trend. But I do seem to recall a
> lot of table descriptions with auto-generated keys and I don't think
> they were all a name vs. ID scenario....
>
> Has anyone else noticed a similar trend?
>
> If this trend is real, it doesn't seem like a very good trend to me.
> For example, if you were keeping track of parts in a warehouse, why
> would anyone make a table that looked like this:
> ID (autogenerated PK) PART_NO PART_DESCRIPTION
> 1 A01 Widget
> 2 B03 Grapple Grommet
> 3 A02 Snow Shovel
> 4 D11 Whisk
> 5 C04 Duct Tape
>
> when this table is simpler:
>
> PART_NO (PK) PART_DESCRIPTION
> A01 Widget
> B03 Grapple Grommet
> A02 Snow Shovel
> D11 Whisk
> C04 Duct Tape
>
> Would anyone care to convince me that the first version of the table
> is "better" than the second version in some way?
>
> I just want to be sure that no one has come along with some new and
> compelling reason to autogenerate keys when perfectly good keys can be
> found within the data already. I don't mind being "old school" but I
> don't want to be "out to lunch" :-)
>
>
> Rhino
>
>
>
Attachment: [text/html]
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.371 / Virus Database: 267.14.3/209 - Release Date: 12/21/2005