List:General Discussion« Previous MessageNext Message »
From:Kenneth Wagner Date:December 22 2005 1:25am
Subject:Re: Are primary keys essential?
View as plain text  
Hi James,

An internal, unique, auto-increment PK is a good idea on dynamic, large 
tables. Smaller, mostly static, tables can often do without a PK. In a 
backend DB it's a godsend. What would happen if I depended on the 
programming in the client-side code to handle the PK?? It would be 

Example:  Customers need a PK. An internal, auto-increment, unique, integer 
works best on DBs. They are optimized for it, especially SQL tables.

Example: Counties. These names hardly ever change and are rather small. So, 
you could do without the internal, integer PK. BUT-- suppose every invoice 
must show the county? Then I would need a county (e.g., varchar 25) field 
for the county name of Yoknapatawphahootchee. If the customer has thousands 
of invoices with us then county starts to take up space. And a fair amount 
of it. Millions of invoices would take up 21 millions of bytes of storage. 
Just multiply number of customers by numbers of invoices. If the invoice 
uses a 1 byte field for the county with an FK in the county file we 
automatically save 40 millions of bytes of storage. Since no state has more 
than 255 counties, a tiny int will work.

Example: States. Hmmm. Let's see, abbreviation for Minnesota is MN. Only two 
bytes needed there. OK, That will work for a PK. We'll do without the 
integer, auto-increment, key here. And we will violate consistency in the 
DB. No big deal in this case. I'll go either way here.

Another good reason is YOU control the PK. Social Security numbers and 
ZipCodes are already being prepared for changes. Why go bonkers 7 years from 
now? And who wants to back-code and back-fill all that code & data?

Truncated, damaged file? What invoice does the row point to? What is the 
date? If it's the last row in the PK then that's where the fixing begins by 
date and by PK#.

Since a PK guarantees uniqueness, I can breathe easily if the file gets 
clobbered. Example: Someone with BIG access rights accidentally deletes 
invoices from 1996. But they should have done it for 1995. (We keep 10 years 
history on hand in an archive file.) Easy to fix. What's the last# in 1995? 
The first # in 1997? Restore only those rows with those numbers, inclusive. 
Suppose now the Customer file is by name, ZIP and first 4 digits in the 
address? (Or something like that. Many of my junk mail addresses show 
"WAGNEKO64054A". If another Wagner, Kenneth O. arrives in zip code 64054, 
presumably, he becomes "WAGNEKO64054B." No middle initial? The it's 
"WAGNEK_64054.") How could I know which customers to restore without going 
thru a lot of queries?

What if I have to break up a table because it's getting too large? Has too 
many columns or rows added to it in the last year? The integer PK works 
really well. And it is small and simple to eyeball. And I can still have any 
other unique index I want on columns in the new related 2nd table.

Mostly, I like the int, auto-increment, unique, unsigned PK because it's 
used everywhere, always looks the same and is very, very fast because the 
indexes are small, optimized for SQL usage and very parsimonious about RAM. 
Intuitively, the numbers also tell me about size and activity levels.

Best of all, they are inviolate. I can trust them. Well, OK, maybe once a 
decade a gamma ray hits the oxide layer on the disk drive and changes 
3,212,434,334 to 3,712,434,334. Still easy to fix. Drop the key and 
re-create it. It will automatically show up. Either as a gap or a duplicate. 
But it WILL show up as something.

Bottom line, uniqueness, stability and order are the "sine qua non"* of good 
data organization.  (*Means without which nothing.)



----- Original Message ----- 
From: "James Harvard" <>
To: <mysql@stripped>
Sent: Wednesday, December 21, 2005 6:01 PM
Subject: Are primary keys essential?

> The PK thread has reminded me of a question I had but never resolved when 
> designing the table structure of the big data warehouse app I was droning 
> on about just now in the aforementioned thread. As need to import some 
> hundreds of millions of rows in the next week, I think now would be a good 
> idea to get a definite answer!
> The core of the app is a mass of data, broken into many tables that I 
> normally only need to query individually. Because I felt uneasy not 
> including a primary key and need to get a proof-of-concept db running I 
> ended up putting an auto_increment int column in the data tables. (Yes, I 
> know, an extra 4 bytes per row when I was talking about saving every byte 
> possible in my last post. <blush>) But the PK column is never used either 
> as a foreign key or in app code for the table itself. But I couldn't put a 
> PK on a combination of other columns, because I don't think I can be sure 
> of uniqueness. Can I just drop the PK column?
> BTW I'm sure this is addressed in all those good books on database design 
> and theory I should have, but never have, read. But I'm a bit short of 
> time, and it's quicker just to pick the brains of you folks! Quicker for 
> me, that is - sorry!
> TIA,
> James Harvard
> -- 
> MySQL General Mailing List
> For list archives:
> To unsubscribe: 

Reason for Auto-increment primary keys?Rhino21 Dec
  • Re: Reason for Auto-increment primary keys?Kenneth Wagner21 Dec
    • Re: Reason for Auto-increment primary keys?SGreen21 Dec
      • Re: Reason for Auto-increment primary keys?Rhino21 Dec
        • Re: Reason for Auto-increment primary keys?SGreen22 Dec
      • Re: Reason for Auto-increment primary keys?Kenneth Wagner22 Dec
        • Re: Reason for Auto-increment primary keys?David Griffiths22 Dec
        • Are primary keys essential?James Harvard22 Dec
      • Re: Are primary keys essential?Kenneth Wagner22 Dec
        • Re: Are primary keys essential?James Harvard22 Dec
          • Re: Are primary keys essential?James Harvard22 Dec
      • Re: Are primary keys essential?Kenneth Wagner22 Dec
      • Re: Are primary keys essential?Rhino22 Dec
      • Reporting tools for summary dataC.R.Vegelin22 Dec
        • Re: Reporting tools for summary dataJames Harvard22 Dec
    • Re: Reason for Auto-increment primary keys?Josh Trutwin21 Dec
  • Re: Reason for Auto-increment primary keys?Peter Brawley21 Dec
    • Re: Reason for Auto-increment primary keys?James Harvard22 Dec
  • Re: Reason for Auto-increment primary keys?Rudy Lippan21 Dec
RE: Are primary keys essential?SST - Adelaide)22 Dec