List:General Discussion« Previous MessageNext Message »
From:Rhino Date:December 21 2005 10:57pm
Subject:Re: Reason for Auto-increment primary keys?
View as plain text  
----- Original Message ----- 
From: <SGreen@stripped>
To: "Kenneth Wagner" <kenneth_wagner@stripped>
Cc: "mysql" <mysql@stripped>; "Rhino" <rhino1@stripped>
Sent: Wednesday, December 21, 2005 5:15 PM
Subject: Re: Reason for Auto-increment primary keys?


> "Kenneth Wagner" <kenneth_wagner@stripped> wrote on 12/21/2005
> 04:27:53 PM:
>
>> Hi Rhino,
>>
>> Excellent question. Felt as you do, initially.
>>
>> Here's what changed my mind.
>>
>> Integer keys are fast. And small. Hence, they take very little RAM
> space.
>>
>> They are contiguous. A missing PK is easy to find. There's a gap in the
>> number sequence.
>> Can't do this with the part description. No way to tell if a record is
>> missing.
>>
>> Example: The system gets hung up or crashes and a reboot is needed.
>> How to test the integrity of the parts table. I.e., anything missing?
> Check
>> the PK for
>> continuity is a good place to start. With a timestamp I would even know
> the
>> date
>> where the file got truncated. Example. It's Dec 20th. The highest date
> in
>> the file is
>> Dec 1st at rec# 1203023. That's where the analysis would begin. Other
> files
>> that
>> didn't get truncated but have the related key # in them would tip me off
> as
>> to how
>> much is missing. Like an order file.
>>
>> Speed. Especially where related files are concerned. Foreign keys. Links
> on
>> integer
>> fields are faster, smaller and more efficient. Keys remain smaller and
>> faster.
>>
>> Activity testing: Let's say I do some statistical testing.  Like how
> many
>> new parts
>> per month on average. Easy to do with the integer PK. Even easier if it
> has
>> a timestamp.
>> Then if the average suddenly drops or increases I would want to know
> why. Or
>> modify
>> my DB tables or coding. Note that the timestamp does not have to be in
> your
>> example
>> table. It could be in an insert/update table that just tracks what has
> been
>> added or updated
>> by PK, timestamp, activity type and updatedbyuserID.
>>
>> So, there's 2 cents worth.
>>
>> Wondering how relevant this is?
>>
>> HTH,
>>
>> Ken Wagner
>>
>>
>>
>> ----- Original Message ----- 
>> From: "Rhino" <rhino1@stripped>
>> To: "mysql" <mysql@stripped>
>> Sent: Wednesday, December 21, 2005 2:54 PM
>> Subject: Reason for Auto-increment primary keys?
>>
>>
>> > One technique that I see a lot on this mailing list is people putting
>> > auto-incremented integer primary keys on their tables.
>> >
>> > Maybe I'm just "old school" but I've always thought that you should
> choose
>> > a primary key based on data that is actually in the table whenever
>> > possible, rather than generating a new value out of thin air.
>> >
>> > The only exception that comes to mind is things like ID numbers; for
>> > example, it is better to use an internally-generated integer for an
>> > employee number than it is to use an employee's name. Even the
> combination
>> > of first name and last name is not necessarily unique - I could cite a
>
>> > real life example -and, of course, people can change their names. That
>
>> > makes names less desireable than a generated value when you are trying
> to
>> > uniquely indentify such entities. In such a case, a nice, reasonable
> short
>> > integer is easier.
>> >
>> > I just found this rather good definition of primary keys at
>> > http://www.utexas.edu/its/windows/database/datamodeling/dm/keys.html.
> The
>> > relevant bit says that a primary key must have:
>> > - a non-null value for each instance of the entity
>> > - a value that is unique for each instance of an entity
>> > - a value that must not change or become null during the life of the
> each
>> > instance of the entity
>> >
>> > That article makes the same basic remarks about name vs. ID but makes
> the
>> > point that it is more commonly the case that table designers will use
>> > something like a social security number - an _externally_ generated
>> > number - to distinguish between employees rather than an
>> > internally-generated number.
>> >
>> > But the trend in this mailing list is toward using generated values as
>
>> > primary keys in virtually EVERY table, even when good primary keys can
> be
>> > found in the (non-generated) data already existing in the table.
>> >
>> > Now, I haven't done anything remotely resembling a quantified analysis
> so
>> > maybe I'm wildly exaggerating this trend. But I do seem to recall a
> lot of
>> > table descriptions with auto-generated keys and I don't think they
> were
>> > all a name vs. ID scenario....
>> >
>> > Has anyone else noticed a similar trend?
>> >
>> > If this trend is real, it doesn't seem like a very good trend to me.
> For
>> > example, if you were keeping track of parts in a warehouse, why would
>> > anyone make a table that looked like this:
>> > ID (autogenerated PK)     PART_NO    PART_DESCRIPTION
>> > 1                                   A01             Widget
>> > 2                                    B03            Grapple Grommet
>> > 3                                    A02            Snow Shovel
>> > 4                                    D11            Whisk
>> > 5                                    C04            Duct Tape
>> >
>> > when this table is simpler:
>> >
>> > PART_NO (PK)   PART_DESCRIPTION
>> > A01                 Widget
>> > B03                Grapple Grommet
>> > A02                Snow Shovel
>> > D11                Whisk
>> > C04                Duct Tape
>> >
>> > Would anyone care to convince me that the first version of the table
> is
>> > "better" than the second version in some way?
>> >
>> > I just want to be sure that no one has come along with some new and
>> > compelling reason to autogenerate keys when perfectly good keys can be
>
>> > found within the data already. I don't mind being "old school" but I
> don't
>> > want to be "out to lunch" :-)
>> >
>> >
>> > Rhino
>> >
>> >
>> >
>> > -- 
>> > No virus found in this outgoing message.
>> > Checked by AVG Free Edition.
>> > Version: 7.1.371 / Virus Database: 267.14.2/208 - Release Date:
> 20/12/2005
>> >
>
> I agree with every point that Kenneth just made. Integers compare 5 to 50
> times faster than strings (depending on the length of the string) and
> usually take up much less room. That means that more index items can fit
> into memory and you are less likely to cause memory paging during an index
> operation.
>
> I frequently define both an auto_inc field and a PK on other values. I use
> the auto_inc field for FK relationships (due to the already mentioned
> reasons) but the PK is there to preserve my data integrity.
>
> Basically, the heavy use of auto_increment is a practical compromise of
> form vs. speed.
>
Well, you (meaning everyone who has responded, not just Shawn) have 
certainly given me a wealth of good reasons to rethink my position on 
auto-incremented keys! I will have to review these reasons very seriously 
about this the next time I design a table....

Shawn, I'm not quite clear what you are saying in your second last 
paragraph. When you have this situation:

ID (autogenerated)     PART_NO    PART_DESCRIPTION
1                              A01             Widget
2                              B03            Grapple Grommet
3                              A02            Snow Shovel
4                              D11            Whisk
5                              C04            Duct Tape

Do you put the PK on ID alone, PART_NO alone, or the concatentation of ID 
and PART_NO? I _think_ you mean that the PK is on PART_NO alone and that ID 
is simply defined unique so that it can be the target for FKs that refer to 
it but I want to be sure I'm not misreading you....

Rhino




-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.371 / Virus Database: 267.14.2/208 - Release Date: 20/12/2005

Thread
Reason for Auto-increment primary keys?Rhino21 Dec
  • Re: Reason for Auto-increment primary keys?Kenneth Wagner21 Dec
    • Re: Reason for Auto-increment primary keys?SGreen21 Dec
      • Re: Reason for Auto-increment primary keys?Rhino21 Dec
        • Re: Reason for Auto-increment primary keys?SGreen22 Dec
      • Re: Reason for Auto-increment primary keys?Kenneth Wagner22 Dec
        • Re: Reason for Auto-increment primary keys?David Griffiths22 Dec
        • Are primary keys essential?James Harvard22 Dec
      • Re: Are primary keys essential?Kenneth Wagner22 Dec
        • Re: Are primary keys essential?James Harvard22 Dec
          • Re: Are primary keys essential?James Harvard22 Dec
      • Re: Are primary keys essential?Kenneth Wagner22 Dec
      • Re: Are primary keys essential?Rhino22 Dec
      • Reporting tools for summary dataC.R.Vegelin22 Dec
        • Re: Reporting tools for summary dataJames Harvard22 Dec
    • Re: Reason for Auto-increment primary keys?Josh Trutwin21 Dec
  • Re: Reason for Auto-increment primary keys?Peter Brawley21 Dec
    • Re: Reason for Auto-increment primary keys?James Harvard22 Dec
  • Re: Reason for Auto-increment primary keys?Rudy Lippan21 Dec
RE: Are primary keys essential?SST - Adelaide)22 Dec