List:General Discussion« Previous MessageNext Message »
From:David Griffiths Date:December 21 2005 11:30pm
Subject:Re: Reason for Auto-increment primary keys?
View as plain text  
Auto-incremented integers (be it bigint, mediumint, etc) are, from a 
purist point of view, better than "natural" primary keys, like part 
number etc. Read Practical Issues in Database Management, by Fabian 
Pascal. He argues against natural primary keys, because the business 
rules that underly the data could change.

The example you use is the social security number - say the US federal 
government decides that a change is required. Say families will all have 
the same SSN number, so that they may file joint income tax returns. You 
have no control over it.

Also, in order to guarantee uniqueness, you often have to use a 
composite primary key (a primary key made up of many columns). This adds 
complexity and size to your database. It also makes using tools like 
Hibernate (an database <--> java-objects mapping tool that is very 
popular and powerful) more difficult to use (at least in version 2.1), 
because composite primary keys have extra overhead.

If a part-number and manufacturer (or supplier) are a unique 
combination, create a unique index on the two. Keep your primary key 
abstract (and unnatural).

I know there are alot of "what-if" statements (I mean, how often will 
the SSN change?), and maybe you've never heard of Hibernate let alone 
had any desire to use it, but the underlying argument is valid - 
business rules change. For complex schemas, with lots of data, that 
could be a nightmare if some external data source that you have no 
control over suddenly changes the rules underlying their data.

Timestamps (last-modified-dates) are very useful for trying to track 
down problems.

David

Kenneth Wagner wrote:

> Rhino,
>
> What I do is put the ID (integer Primary Key, auto-increment, unique) 
> first.
> First key in every file.
>
> Then define my indexes.
>
> You could, do it the other way as you ask. But, I found this way is 
> consistent.
> It can be traced anywhere on any file. Timestamp on important or 
> critical files
> is also a big plus. It's not needed on static or almost static files 
> like counties,
> states, departments, etc. But orders, parts, inventory, customers and 
> so on
> benefit from a timestamp. Beside SQL is optimized for it AFAIK.
>
> This hasn't ever been an issue aside from a few people asking about 
> how to
> use the ID? Or what's it for?
>
> Ken
>
> ----- Original Message ----- From: "Rhino" <rhino1@stripped>
> To: "Kenneth Wagner" <kenneth_wagner@stripped>; <SGreen@stripped>
> Cc: "mysql" <mysql@stripped>
> Sent: Wednesday, December 21, 2005 4:57 PM
> Subject: Re: Reason for Auto-increment primary keys?
>
>
>>
>> ----- Original Message ----- From: <SGreen@stripped>
>> To: "Kenneth Wagner" <kenneth_wagner@stripped>
>> Cc: "mysql" <mysql@stripped>; "Rhino" <rhino1@stripped>
>> Sent: Wednesday, December 21, 2005 5:15 PM
>> Subject: Re: Reason for Auto-increment primary keys?
>>
>>
>>> "Kenneth Wagner" <kenneth_wagner@stripped> wrote on 12/21/2005
>>> 04:27:53 PM:
>>>
>>>> Hi Rhino,
>>>>
>>>> Excellent question. Felt as you do, initially.
>>>>
>>>> Here's what changed my mind.
>>>>
>>>> Integer keys are fast. And small. Hence, they take very little RAM
>>>
>>> space.
>>>
>>>>
>>>> They are contiguous. A missing PK is easy to find. There's a gap in 
>>>> the
>>>> number sequence.
>>>> Can't do this with the part description. No way to tell if a record is
>>>> missing.
>>>>
>>>> Example: The system gets hung up or crashes and a reboot is needed.
>>>> How to test the integrity of the parts table. I.e., anything missing?
>>>
>>> Check
>>>
>>>> the PK for
>>>> continuity is a good place to start. With a timestamp I would even 
>>>> know
>>>
>>> the
>>>
>>>> date
>>>> where the file got truncated. Example. It's Dec 20th. The highest date
>>>
>>> in
>>>
>>>> the file is
>>>> Dec 1st at rec# 1203023. That's where the analysis would begin. Other
>>>
>>> files
>>>
>>>> that
>>>> didn't get truncated but have the related key # in them would tip 
>>>> me off
>>>
>>> as
>>>
>>>> to how
>>>> much is missing. Like an order file.
>>>>
>>>> Speed. Especially where related files are concerned. Foreign keys. 
>>>> Links
>>>
>>> on
>>>
>>>> integer
>>>> fields are faster, smaller and more efficient. Keys remain smaller and
>>>> faster.
>>>>
>>>> Activity testing: Let's say I do some statistical testing.  Like how
>>>
>>> many
>>>
>>>> new parts
>>>> per month on average. Easy to do with the integer PK. Even easier 
>>>> if it
>>>
>>> has
>>>
>>>> a timestamp.
>>>> Then if the average suddenly drops or increases I would want to know
>>>
>>> why. Or
>>>
>>>> modify
>>>> my DB tables or coding. Note that the timestamp does not have to be in
>>>
>>> your
>>>
>>>> example
>>>> table. It could be in an insert/update table that just tracks what has
>>>
>>> been
>>>
>>>> added or updated
>>>> by PK, timestamp, activity type and updatedbyuserID.
>>>>
>>>> So, there's 2 cents worth.
>>>>
>>>> Wondering how relevant this is?
>>>>
>>>> HTH,
>>>>
>>>> Ken Wagner
>>>>
>>>>
>>>>
>>>> ----- Original Message ----- From: "Rhino" <rhino1@stripped>
>>>> To: "mysql" <mysql@stripped>
>>>> Sent: Wednesday, December 21, 2005 2:54 PM
>>>> Subject: Reason for Auto-increment primary keys?
>>>>
>>>>
>>>> > One technique that I see a lot on this mailing list is people 
>>>> putting
>>>> > auto-incremented integer primary keys on their tables.
>>>> >
>>>> > Maybe I'm just "old school" but I've always thought that you should
>>>
>>> choose
>>>
>>>> > a primary key based on data that is actually in the table whenever
>>>> > possible, rather than generating a new value out of thin air.
>>>> >
>>>> > The only exception that comes to mind is things like ID numbers;
> for
>>>> > example, it is better to use an internally-generated integer for an
>>>> > employee number than it is to use an employee's name. Even the
>>>
>>> combination
>>>
>>>> > of first name and last name is not necessarily unique - I could 
>>>> cite a
>>>
>>>
>>>> > real life example -and, of course, people can change their names. 
>>>> That
>>>
>>>
>>>> > makes names less desireable than a generated value when you are 
>>>> trying
>>>
>>> to
>>>
>>>> > uniquely indentify such entities. In such a case, a nice,
> reasonable
>>>
>>> short
>>>
>>>> > integer is easier.
>>>> >
>>>> > I just found this rather good definition of primary keys at
>>>> > 
>>>> http://www.utexas.edu/its/windows/database/datamodeling/dm/keys.html.
>>>
>>> The
>>>
>>>> > relevant bit says that a primary key must have:
>>>> > - a non-null value for each instance of the entity
>>>> > - a value that is unique for each instance of an entity
>>>> > - a value that must not change or become null during the life of
> the
>>>
>>> each
>>>
>>>> > instance of the entity
>>>> >
>>>> > That article makes the same basic remarks about name vs. ID but 
>>>> makes
>>>
>>> the
>>>
>>>> > point that it is more commonly the case that table designers will 
>>>> use
>>>> > something like a social security number - an _externally_ generated
>>>> > number - to distinguish between employees rather than an
>>>> > internally-generated number.
>>>> >
>>>> > But the trend in this mailing list is toward using generated 
>>>> values as
>>>
>>>
>>>> > primary keys in virtually EVERY table, even when good primary 
>>>> keys can
>>>
>>> be
>>>
>>>> > found in the (non-generated) data already existing in the table.
>>>> >
>>>> > Now, I haven't done anything remotely resembling a quantified 
>>>> analysis
>>>
>>> so
>>>
>>>> > maybe I'm wildly exaggerating this trend. But I do seem to recall a
>>>
>>> lot of
>>>
>>>> > table descriptions with auto-generated keys and I don't think they
>>>
>>> were
>>>
>>>> > all a name vs. ID scenario....
>>>> >
>>>> > Has anyone else noticed a similar trend?
>>>> >
>>>> > If this trend is real, it doesn't seem like a very good trend to
> me.
>>>
>>> For
>>>
>>>> > example, if you were keeping track of parts in a warehouse, why 
>>>> would
>>>> > anyone make a table that looked like this:
>>>> > ID (autogenerated PK)     PART_NO    PART_DESCRIPTION
>>>> > 1                                   A01             Widget
>>>> > 2                                    B03            Grapple Grommet
>>>> > 3                                    A02            Snow Shovel
>>>> > 4                                    D11            Whisk
>>>> > 5                                    C04            Duct Tape
>>>> >
>>>> > when this table is simpler:
>>>> >
>>>> > PART_NO (PK)   PART_DESCRIPTION
>>>> > A01                 Widget
>>>> > B03                Grapple Grommet
>>>> > A02                Snow Shovel
>>>> > D11                Whisk
>>>> > C04                Duct Tape
>>>> >
>>>> > Would anyone care to convince me that the first version of the
> table
>>>
>>> is
>>>
>>>> > "better" than the second version in some way?
>>>> >
>>>> > I just want to be sure that no one has come along with some new and
>>>> > compelling reason to autogenerate keys when perfectly good keys 
>>>> can be
>>>
>>>
>>>> > found within the data already. I don't mind being "old school" but
> I
>>>
>>> don't
>>>
>>>> > want to be "out to lunch" :-)
>>>> >
>>>> >
>>>> > Rhino
>>>> >
>>>> >
>>>> >
>>>> > -- > No virus found in this outgoing message.
>>>> > Checked by AVG Free Edition.
>>>> > Version: 7.1.371 / Virus Database: 267.14.2/208 - Release Date:
>>>
>>> 20/12/2005
>>>
>>>> >
>>>
>>>
>>> I agree with every point that Kenneth just made. Integers compare 5 
>>> to 50
>>> times faster than strings (depending on the length of the string) and
>>> usually take up much less room. That means that more index items can 
>>> fit
>>> into memory and you are less likely to cause memory paging during an 
>>> index
>>> operation.
>>>
>>> I frequently define both an auto_inc field and a PK on other values. 
>>> I use
>>> the auto_inc field for FK relationships (due to the already mentioned
>>> reasons) but the PK is there to preserve my data integrity.
>>>
>>> Basically, the heavy use of auto_increment is a practical compromise of
>>> form vs. speed.
>>>
>> Well, you (meaning everyone who has responded, not just Shawn) have 
>> certainly given me a wealth of good reasons to rethink my position on 
>> auto-incremented keys! I will have to review these reasons very 
>> seriously about this the next time I design a table....
>>
>> Shawn, I'm not quite clear what you are saying in your second last 
>> paragraph. When you have this situation:
>>
>> ID (autogenerated)     PART_NO    PART_DESCRIPTION
>> 1                              A01             Widget
>> 2                              B03            Grapple Grommet
>> 3                              A02            Snow Shovel
>> 4                              D11            Whisk
>> 5                              C04            Duct Tape
>>
>> Do you put the PK on ID alone, PART_NO alone, or the concatentation 
>> of ID and PART_NO? I _think_ you mean that the PK is on PART_NO alone 
>> and that ID is simply defined unique so that it can be the target for 
>> FKs that refer to it but I want to be sure I'm not misreading you....
>>
>> Rhino
>>
>>
>>
>>
>> -- 
>> No virus found in this outgoing message.
>> Checked by AVG Free Edition.
>> Version: 7.1.371 / Virus Database: 267.14.2/208 - Release Date: 
>> 20/12/2005
>>
>>
>> -- 
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe: 
>> http://lists.mysql.com/mysql?unsub=1
>>
>>
>
>
>
Thread
Reason for Auto-increment primary keys?Rhino21 Dec
  • Re: Reason for Auto-increment primary keys?Kenneth Wagner21 Dec
    • Re: Reason for Auto-increment primary keys?SGreen21 Dec
      • Re: Reason for Auto-increment primary keys?Rhino21 Dec
        • Re: Reason for Auto-increment primary keys?SGreen22 Dec
      • Re: Reason for Auto-increment primary keys?Kenneth Wagner22 Dec
        • Re: Reason for Auto-increment primary keys?David Griffiths22 Dec
        • Are primary keys essential?James Harvard22 Dec
      • Re: Are primary keys essential?Kenneth Wagner22 Dec
        • Re: Are primary keys essential?James Harvard22 Dec
          • Re: Are primary keys essential?James Harvard22 Dec
      • Re: Are primary keys essential?Kenneth Wagner22 Dec
      • Re: Are primary keys essential?Rhino22 Dec
      • Reporting tools for summary dataC.R.Vegelin22 Dec
        • Re: Reporting tools for summary dataJames Harvard22 Dec
    • Re: Reason for Auto-increment primary keys?Josh Trutwin21 Dec
  • Re: Reason for Auto-increment primary keys?Peter Brawley21 Dec
    • Re: Reason for Auto-increment primary keys?James Harvard22 Dec
  • Re: Reason for Auto-increment primary keys?Rudy Lippan21 Dec
RE: Are primary keys essential?SST - Adelaide)22 Dec