From: Date: December 22 2005 12:09am Subject: Re: Reason for Auto-increment primary keys? List-Archive: http://lists.mysql.com/mysql/193144 Message-Id: <00d701c60683$a417e670$021ff504@net> MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit Rhino, What I do is put the ID (integer Primary Key, auto-increment, unique) first. First key in every file. Then define my indexes. You could, do it the other way as you ask. But, I found this way is consistent. It can be traced anywhere on any file. Timestamp on important or critical files is also a big plus. It's not needed on static or almost static files like counties, states, departments, etc. But orders, parts, inventory, customers and so on benefit from a timestamp. Beside SQL is optimized for it AFAIK. This hasn't ever been an issue aside from a few people asking about how to use the ID? Or what's it for? Ken ----- Original Message ----- From: "Rhino" To: "Kenneth Wagner" ; Cc: "mysql" Sent: Wednesday, December 21, 2005 4:57 PM Subject: Re: Reason for Auto-increment primary keys? > > ----- Original Message ----- > From: > To: "Kenneth Wagner" > Cc: "mysql" ; "Rhino" > Sent: Wednesday, December 21, 2005 5:15 PM > Subject: Re: Reason for Auto-increment primary keys? > > >> "Kenneth Wagner" wrote on 12/21/2005 >> 04:27:53 PM: >> >>> Hi Rhino, >>> >>> Excellent question. Felt as you do, initially. >>> >>> Here's what changed my mind. >>> >>> Integer keys are fast. And small. Hence, they take very little RAM >> space. >>> >>> They are contiguous. A missing PK is easy to find. There's a gap in the >>> number sequence. >>> Can't do this with the part description. No way to tell if a record is >>> missing. >>> >>> Example: The system gets hung up or crashes and a reboot is needed. >>> How to test the integrity of the parts table. I.e., anything missing? >> Check >>> the PK for >>> continuity is a good place to start. With a timestamp I would even know >> the >>> date >>> where the file got truncated. Example. It's Dec 20th. The highest date >> in >>> the file is >>> Dec 1st at rec# 1203023. That's where the analysis would begin. Other >> files >>> that >>> didn't get truncated but have the related key # in them would tip me off >> as >>> to how >>> much is missing. Like an order file. >>> >>> Speed. Especially where related files are concerned. Foreign keys. Links >> on >>> integer >>> fields are faster, smaller and more efficient. Keys remain smaller and >>> faster. >>> >>> Activity testing: Let's say I do some statistical testing. Like how >> many >>> new parts >>> per month on average. Easy to do with the integer PK. Even easier if it >> has >>> a timestamp. >>> Then if the average suddenly drops or increases I would want to know >> why. Or >>> modify >>> my DB tables or coding. Note that the timestamp does not have to be in >> your >>> example >>> table. It could be in an insert/update table that just tracks what has >> been >>> added or updated >>> by PK, timestamp, activity type and updatedbyuserID. >>> >>> So, there's 2 cents worth. >>> >>> Wondering how relevant this is? >>> >>> HTH, >>> >>> Ken Wagner >>> >>> >>> >>> ----- Original Message ----- >>> From: "Rhino" >>> To: "mysql" >>> Sent: Wednesday, December 21, 2005 2:54 PM >>> Subject: Reason for Auto-increment primary keys? >>> >>> >>> > One technique that I see a lot on this mailing list is people putting >>> > auto-incremented integer primary keys on their tables. >>> > >>> > Maybe I'm just "old school" but I've always thought that you should >> choose >>> > a primary key based on data that is actually in the table whenever >>> > possible, rather than generating a new value out of thin air. >>> > >>> > The only exception that comes to mind is things like ID numbers; for >>> > example, it is better to use an internally-generated integer for an >>> > employee number than it is to use an employee's name. Even the >> combination >>> > of first name and last name is not necessarily unique - I could cite a >> >>> > real life example -and, of course, people can change their names. That >> >>> > makes names less desireable than a generated value when you are trying >> to >>> > uniquely indentify such entities. In such a case, a nice, reasonable >> short >>> > integer is easier. >>> > >>> > I just found this rather good definition of primary keys at >>> > http://www.utexas.edu/its/windows/database/datamodeling/dm/keys.html. >> The >>> > relevant bit says that a primary key must have: >>> > - a non-null value for each instance of the entity >>> > - a value that is unique for each instance of an entity >>> > - a value that must not change or become null during the life of the >> each >>> > instance of the entity >>> > >>> > That article makes the same basic remarks about name vs. ID but makes >> the >>> > point that it is more commonly the case that table designers will use >>> > something like a social security number - an _externally_ generated >>> > number - to distinguish between employees rather than an >>> > internally-generated number. >>> > >>> > But the trend in this mailing list is toward using generated values as >> >>> > primary keys in virtually EVERY table, even when good primary keys can >> be >>> > found in the (non-generated) data already existing in the table. >>> > >>> > Now, I haven't done anything remotely resembling a quantified analysis >> so >>> > maybe I'm wildly exaggerating this trend. But I do seem to recall a >> lot of >>> > table descriptions with auto-generated keys and I don't think they >> were >>> > all a name vs. ID scenario.... >>> > >>> > Has anyone else noticed a similar trend? >>> > >>> > If this trend is real, it doesn't seem like a very good trend to me. >> For >>> > example, if you were keeping track of parts in a warehouse, why would >>> > anyone make a table that looked like this: >>> > ID (autogenerated PK) PART_NO PART_DESCRIPTION >>> > 1 A01 Widget >>> > 2 B03 Grapple Grommet >>> > 3 A02 Snow Shovel >>> > 4 D11 Whisk >>> > 5 C04 Duct Tape >>> > >>> > when this table is simpler: >>> > >>> > PART_NO (PK) PART_DESCRIPTION >>> > A01 Widget >>> > B03 Grapple Grommet >>> > A02 Snow Shovel >>> > D11 Whisk >>> > C04 Duct Tape >>> > >>> > Would anyone care to convince me that the first version of the table >> is >>> > "better" than the second version in some way? >>> > >>> > I just want to be sure that no one has come along with some new and >>> > compelling reason to autogenerate keys when perfectly good keys can be >> >>> > found within the data already. I don't mind being "old school" but I >> don't >>> > want to be "out to lunch" :-) >>> > >>> > >>> > Rhino >>> > >>> > >>> > >>> > -- >>> > No virus found in this outgoing message. >>> > Checked by AVG Free Edition. >>> > Version: 7.1.371 / Virus Database: 267.14.2/208 - Release Date: >> 20/12/2005 >>> > >> >> I agree with every point that Kenneth just made. Integers compare 5 to 50 >> times faster than strings (depending on the length of the string) and >> usually take up much less room. That means that more index items can fit >> into memory and you are less likely to cause memory paging during an >> index >> operation. >> >> I frequently define both an auto_inc field and a PK on other values. I >> use >> the auto_inc field for FK relationships (due to the already mentioned >> reasons) but the PK is there to preserve my data integrity. >> >> Basically, the heavy use of auto_increment is a practical compromise of >> form vs. speed. >> > Well, you (meaning everyone who has responded, not just Shawn) have > certainly given me a wealth of good reasons to rethink my position on > auto-incremented keys! I will have to review these reasons very seriously > about this the next time I design a table.... > > Shawn, I'm not quite clear what you are saying in your second last > paragraph. When you have this situation: > > ID (autogenerated) PART_NO PART_DESCRIPTION > 1 A01 Widget > 2 B03 Grapple Grommet > 3 A02 Snow Shovel > 4 D11 Whisk > 5 C04 Duct Tape > > Do you put the PK on ID alone, PART_NO alone, or the concatentation of ID > and PART_NO? I _think_ you mean that the PK is on PART_NO alone and that > ID is simply defined unique so that it can be the target for FKs that > refer to it but I want to be sure I'm not misreading you.... > > Rhino > > > > > -- > No virus found in this outgoing message. > Checked by AVG Free Edition. > Version: 7.1.371 / Virus Database: 267.14.2/208 - Release Date: 20/12/2005 > > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: > http://lists.mysql.com/mysql?unsub=kenneth_wagner@stripped > >