----- Original Message -----
To: "Kenneth Wagner" <kenneth_wagner@stripped>
Cc: "mysql" <mysql@stripped>; "Rhino" <rhino1@stripped>
Sent: Wednesday, December 21, 2005 5:15 PM
Subject: Re: Reason for Auto-increment primary keys?
> "Kenneth Wagner" <kenneth_wagner@stripped> wrote on 12/21/2005
> 04:27:53 PM:
>> Hi Rhino,
>> Excellent question. Felt as you do, initially.
>> Here's what changed my mind.
>> Integer keys are fast. And small. Hence, they take very little RAM
>> They are contiguous. A missing PK is easy to find. There's a gap in the
>> number sequence.
>> Can't do this with the part description. No way to tell if a record is
>> Example: The system gets hung up or crashes and a reboot is needed.
>> How to test the integrity of the parts table. I.e., anything missing?
>> the PK for
>> continuity is a good place to start. With a timestamp I would even know
>> where the file got truncated. Example. It's Dec 20th. The highest date
>> the file is
>> Dec 1st at rec# 1203023. That's where the analysis would begin. Other
>> didn't get truncated but have the related key # in them would tip me off
>> to how
>> much is missing. Like an order file.
>> Speed. Especially where related files are concerned. Foreign keys. Links
>> fields are faster, smaller and more efficient. Keys remain smaller and
>> Activity testing: Let's say I do some statistical testing. Like how
>> new parts
>> per month on average. Easy to do with the integer PK. Even easier if it
>> a timestamp.
>> Then if the average suddenly drops or increases I would want to know
> why. Or
>> my DB tables or coding. Note that the timestamp does not have to be in
>> table. It could be in an insert/update table that just tracks what has
>> added or updated
>> by PK, timestamp, activity type and updatedbyuserID.
>> So, there's 2 cents worth.
>> Wondering how relevant this is?
>> Ken Wagner
>> ----- Original Message -----
>> From: "Rhino" <rhino1@stripped>
>> To: "mysql" <mysql@stripped>
>> Sent: Wednesday, December 21, 2005 2:54 PM
>> Subject: Reason for Auto-increment primary keys?
>> > One technique that I see a lot on this mailing list is people putting
>> > auto-incremented integer primary keys on their tables.
>> > Maybe I'm just "old school" but I've always thought that you should
>> > a primary key based on data that is actually in the table whenever
>> > possible, rather than generating a new value out of thin air.
>> > The only exception that comes to mind is things like ID numbers; for
>> > example, it is better to use an internally-generated integer for an
>> > employee number than it is to use an employee's name. Even the
>> > of first name and last name is not necessarily unique - I could cite a
>> > real life example -and, of course, people can change their names. That
>> > makes names less desireable than a generated value when you are trying
>> > uniquely indentify such entities. In such a case, a nice, reasonable
>> > integer is easier.
>> > I just found this rather good definition of primary keys at
>> > http://www.utexas.edu/its/windows/database/datamodeling/dm/keys.html.
>> > relevant bit says that a primary key must have:
>> > - a non-null value for each instance of the entity
>> > - a value that is unique for each instance of an entity
>> > - a value that must not change or become null during the life of the
>> > instance of the entity
>> > That article makes the same basic remarks about name vs. ID but makes
>> > point that it is more commonly the case that table designers will use
>> > something like a social security number - an _externally_ generated
>> > number - to distinguish between employees rather than an
>> > internally-generated number.
>> > But the trend in this mailing list is toward using generated values as
>> > primary keys in virtually EVERY table, even when good primary keys can
>> > found in the (non-generated) data already existing in the table.
>> > Now, I haven't done anything remotely resembling a quantified analysis
>> > maybe I'm wildly exaggerating this trend. But I do seem to recall a
> lot of
>> > table descriptions with auto-generated keys and I don't think they
>> > all a name vs. ID scenario....
>> > Has anyone else noticed a similar trend?
>> > If this trend is real, it doesn't seem like a very good trend to me.
>> > example, if you were keeping track of parts in a warehouse, why would
>> > anyone make a table that looked like this:
>> > ID (autogenerated PK) PART_NO PART_DESCRIPTION
>> > 1 A01 Widget
>> > 2 B03 Grapple Grommet
>> > 3 A02 Snow Shovel
>> > 4 D11 Whisk
>> > 5 C04 Duct Tape
>> > when this table is simpler:
>> > PART_NO (PK) PART_DESCRIPTION
>> > A01 Widget
>> > B03 Grapple Grommet
>> > A02 Snow Shovel
>> > D11 Whisk
>> > C04 Duct Tape
>> > Would anyone care to convince me that the first version of the table
>> > "better" than the second version in some way?
>> > I just want to be sure that no one has come along with some new and
>> > compelling reason to autogenerate keys when perfectly good keys can be
>> > found within the data already. I don't mind being "old school" but I
>> > want to be "out to lunch" :-)
>> > Rhino
>> > --
>> > No virus found in this outgoing message.
>> > Checked by AVG Free Edition.
>> > Version: 7.1.371 / Virus Database: 267.14.2/208 - Release Date:
> I agree with every point that Kenneth just made. Integers compare 5 to 50
> times faster than strings (depending on the length of the string) and
> usually take up much less room. That means that more index items can fit
> into memory and you are less likely to cause memory paging during an index
> I frequently define both an auto_inc field and a PK on other values. I use
> the auto_inc field for FK relationships (due to the already mentioned
> reasons) but the PK is there to preserve my data integrity.
> Basically, the heavy use of auto_increment is a practical compromise of
> form vs. speed.
Well, you (meaning everyone who has responded, not just Shawn) have
certainly given me a wealth of good reasons to rethink my position on
auto-incremented keys! I will have to review these reasons very seriously
about this the next time I design a table....
Shawn, I'm not quite clear what you are saying in your second last
paragraph. When you have this situation:
ID (autogenerated) PART_NO PART_DESCRIPTION
1 A01 Widget
2 B03 Grapple Grommet
3 A02 Snow Shovel
4 D11 Whisk
5 C04 Duct Tape
Do you put the PK on ID alone, PART_NO alone, or the concatentation of ID
and PART_NO? I _think_ you mean that the PK is on PART_NO alone and that ID
is simply defined unique so that it can be the target for FKs that refer to
it but I want to be sure I'm not misreading you....
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.371 / Virus Database: 267.14.2/208 - Release Date: 20/12/2005