An internal, unique, auto-increment PK is a good idea on dynamic, large
tables. Smaller, mostly static, tables can often do without a PK. In a
backend DB it's a godsend. What would happen if I depended on the
programming in the client-side code to handle the PK?? It would be
Example: Customers need a PK. An internal, auto-increment, unique, integer
works best on DBs. They are optimized for it, especially SQL tables.
Example: Counties. These names hardly ever change and are rather small. So,
you could do without the internal, integer PK. BUT-- suppose every invoice
must show the county? Then I would need a county (e.g., varchar 25) field
for the county name of Yoknapatawphahootchee. If the customer has thousands
of invoices with us then county starts to take up space. And a fair amount
of it. Millions of invoices would take up 21 millions of bytes of storage.
Just multiply number of customers by numbers of invoices. If the invoice
uses a 1 byte field for the county with an FK in the county file we
automatically save 40 millions of bytes of storage. Since no state has more
than 255 counties, a tiny int will work.
Example: States. Hmmm. Let's see, abbreviation for Minnesota is MN. Only two
bytes needed there. OK, That will work for a PK. We'll do without the
integer, auto-increment, key here. And we will violate consistency in the
DB. No big deal in this case. I'll go either way here.
Another good reason is YOU control the PK. Social Security numbers and
ZipCodes are already being prepared for changes. Why go bonkers 7 years from
now? And who wants to back-code and back-fill all that code & data?
Truncated, damaged file? What invoice does the row point to? What is the
date? If it's the last row in the PK then that's where the fixing begins by
date and by PK#.
Since a PK guarantees uniqueness, I can breathe easily if the file gets
clobbered. Example: Someone with BIG access rights accidentally deletes
invoices from 1996. But they should have done it for 1995. (We keep 10 years
history on hand in an archive file.) Easy to fix. What's the last# in 1995?
The first # in 1997? Restore only those rows with those numbers, inclusive.
Suppose now the Customer file is by name, ZIP and first 4 digits in the
address? (Or something like that. Many of my junk mail addresses show
"WAGNEKO64054A". If another Wagner, Kenneth O. arrives in zip code 64054,
presumably, he becomes "WAGNEKO64054B." No middle initial? The it's
"WAGNEK_64054.") How could I know which customers to restore without going
thru a lot of queries?
What if I have to break up a table because it's getting too large? Has too
many columns or rows added to it in the last year? The integer PK works
really well. And it is small and simple to eyeball. And I can still have any
other unique index I want on columns in the new related 2nd table.
Mostly, I like the int, auto-increment, unique, unsigned PK because it's
used everywhere, always looks the same and is very, very fast because the
indexes are small, optimized for SQL usage and very parsimonious about RAM.
Intuitively, the numbers also tell me about size and activity levels.
Best of all, they are inviolate. I can trust them. Well, OK, maybe once a
decade a gamma ray hits the oxide layer on the disk drive and changes
3,212,434,334 to 3,712,434,334. Still easy to fix. Drop the key and
re-create it. It will automatically show up. Either as a gap or a duplicate.
But it WILL show up as something.
Bottom line, uniqueness, stability and order are the "sine qua non"* of good
data organization. (*Means without which nothing.)
----- Original Message -----
From: "James Harvard" <james.lists.tech@stripped>
Sent: Wednesday, December 21, 2005 6:01 PM
Subject: Are primary keys essential?
> The PK thread has reminded me of a question I had but never resolved when
> designing the table structure of the big data warehouse app I was droning
> on about just now in the aforementioned thread. As need to import some
> hundreds of millions of rows in the next week, I think now would be a good
> idea to get a definite answer!
> The core of the app is a mass of data, broken into many tables that I
> normally only need to query individually. Because I felt uneasy not
> including a primary key and need to get a proof-of-concept db running I
> ended up putting an auto_increment int column in the data tables. (Yes, I
> know, an extra 4 bytes per row when I was talking about saving every byte
> possible in my last post. <blush>) But the PK column is never used either
> as a foreign key or in app code for the table itself. But I couldn't put a
> PK on a combination of other columns, because I don't think I can be sure
> of uniqueness. Can I just drop the PK column?
> BTW I'm sure this is addressed in all those good books on database design
> and theory I should have, but never have, read. But I'm a bit short of
> time, and it's quicker just to pick the brains of you folks! Quicker for
> me, that is - sorry!
> James Harvard
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: