From: Date: December 21 2005 11:57pm Subject: Re: Reason for Auto-increment primary keys? List-Archive: http://lists.mysql.com/mysql/193143 Message-Id: <019b01c60681$e03252a0$0d02a8c0@athlon> MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=iso-8859-1; reply-type=original Content-Transfer-Encoding: 7bit ----- Original Message ----- From: To: "Kenneth Wagner" Cc: "mysql" ; "Rhino" Sent: Wednesday, December 21, 2005 5:15 PM Subject: Re: Reason for Auto-increment primary keys? > "Kenneth Wagner" wrote on 12/21/2005 > 04:27:53 PM: > >> Hi Rhino, >> >> Excellent question. Felt as you do, initially. >> >> Here's what changed my mind. >> >> Integer keys are fast. And small. Hence, they take very little RAM > space. >> >> They are contiguous. A missing PK is easy to find. There's a gap in the >> number sequence. >> Can't do this with the part description. No way to tell if a record is >> missing. >> >> Example: The system gets hung up or crashes and a reboot is needed. >> How to test the integrity of the parts table. I.e., anything missing? > Check >> the PK for >> continuity is a good place to start. With a timestamp I would even know > the >> date >> where the file got truncated. Example. It's Dec 20th. The highest date > in >> the file is >> Dec 1st at rec# 1203023. That's where the analysis would begin. Other > files >> that >> didn't get truncated but have the related key # in them would tip me off > as >> to how >> much is missing. Like an order file. >> >> Speed. Especially where related files are concerned. Foreign keys. Links > on >> integer >> fields are faster, smaller and more efficient. Keys remain smaller and >> faster. >> >> Activity testing: Let's say I do some statistical testing. Like how > many >> new parts >> per month on average. Easy to do with the integer PK. Even easier if it > has >> a timestamp. >> Then if the average suddenly drops or increases I would want to know > why. Or >> modify >> my DB tables or coding. Note that the timestamp does not have to be in > your >> example >> table. It could be in an insert/update table that just tracks what has > been >> added or updated >> by PK, timestamp, activity type and updatedbyuserID. >> >> So, there's 2 cents worth. >> >> Wondering how relevant this is? >> >> HTH, >> >> Ken Wagner >> >> >> >> ----- Original Message ----- >> From: "Rhino" >> To: "mysql" >> Sent: Wednesday, December 21, 2005 2:54 PM >> Subject: Reason for Auto-increment primary keys? >> >> >> > One technique that I see a lot on this mailing list is people putting >> > auto-incremented integer primary keys on their tables. >> > >> > Maybe I'm just "old school" but I've always thought that you should > choose >> > a primary key based on data that is actually in the table whenever >> > possible, rather than generating a new value out of thin air. >> > >> > The only exception that comes to mind is things like ID numbers; for >> > example, it is better to use an internally-generated integer for an >> > employee number than it is to use an employee's name. Even the > combination >> > of first name and last name is not necessarily unique - I could cite a > >> > real life example -and, of course, people can change their names. That > >> > makes names less desireable than a generated value when you are trying > to >> > uniquely indentify such entities. In such a case, a nice, reasonable > short >> > integer is easier. >> > >> > I just found this rather good definition of primary keys at >> > http://www.utexas.edu/its/windows/database/datamodeling/dm/keys.html. > The >> > relevant bit says that a primary key must have: >> > - a non-null value for each instance of the entity >> > - a value that is unique for each instance of an entity >> > - a value that must not change or become null during the life of the > each >> > instance of the entity >> > >> > That article makes the same basic remarks about name vs. ID but makes > the >> > point that it is more commonly the case that table designers will use >> > something like a social security number - an _externally_ generated >> > number - to distinguish between employees rather than an >> > internally-generated number. >> > >> > But the trend in this mailing list is toward using generated values as > >> > primary keys in virtually EVERY table, even when good primary keys can > be >> > found in the (non-generated) data already existing in the table. >> > >> > Now, I haven't done anything remotely resembling a quantified analysis > so >> > maybe I'm wildly exaggerating this trend. But I do seem to recall a > lot of >> > table descriptions with auto-generated keys and I don't think they > were >> > all a name vs. ID scenario.... >> > >> > Has anyone else noticed a similar trend? >> > >> > If this trend is real, it doesn't seem like a very good trend to me. > For >> > example, if you were keeping track of parts in a warehouse, why would >> > anyone make a table that looked like this: >> > ID (autogenerated PK) PART_NO PART_DESCRIPTION >> > 1 A01 Widget >> > 2 B03 Grapple Grommet >> > 3 A02 Snow Shovel >> > 4 D11 Whisk >> > 5 C04 Duct Tape >> > >> > when this table is simpler: >> > >> > PART_NO (PK) PART_DESCRIPTION >> > A01 Widget >> > B03 Grapple Grommet >> > A02 Snow Shovel >> > D11 Whisk >> > C04 Duct Tape >> > >> > Would anyone care to convince me that the first version of the table > is >> > "better" than the second version in some way? >> > >> > I just want to be sure that no one has come along with some new and >> > compelling reason to autogenerate keys when perfectly good keys can be > >> > found within the data already. I don't mind being "old school" but I > don't >> > want to be "out to lunch" :-) >> > >> > >> > Rhino >> > >> > >> > >> > -- >> > No virus found in this outgoing message. >> > Checked by AVG Free Edition. >> > Version: 7.1.371 / Virus Database: 267.14.2/208 - Release Date: > 20/12/2005 >> > > > I agree with every point that Kenneth just made. Integers compare 5 to 50 > times faster than strings (depending on the length of the string) and > usually take up much less room. That means that more index items can fit > into memory and you are less likely to cause memory paging during an index > operation. > > I frequently define both an auto_inc field and a PK on other values. I use > the auto_inc field for FK relationships (due to the already mentioned > reasons) but the PK is there to preserve my data integrity. > > Basically, the heavy use of auto_increment is a practical compromise of > form vs. speed. > Well, you (meaning everyone who has responded, not just Shawn) have certainly given me a wealth of good reasons to rethink my position on auto-incremented keys! I will have to review these reasons very seriously about this the next time I design a table.... Shawn, I'm not quite clear what you are saying in your second last paragraph. When you have this situation: ID (autogenerated) PART_NO PART_DESCRIPTION 1 A01 Widget 2 B03 Grapple Grommet 3 A02 Snow Shovel 4 D11 Whisk 5 C04 Duct Tape Do you put the PK on ID alone, PART_NO alone, or the concatentation of ID and PART_NO? I _think_ you mean that the PK is on PART_NO alone and that ID is simply defined unique so that it can be the target for FKs that refer to it but I want to be sure I'm not misreading you.... Rhino -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.371 / Virus Database: 267.14.2/208 - Release Date: 20/12/2005