I have only one thing to say really on this. Data never has any meaning at
all. It is simply data, and not information. Information is extracted from
data and is then given meaning by the viewer of the data.
In the case of NULL I have always thought of it is simply "undefined". Thats
what Orecal says it is, and Paradox, Interbase, informix, and the if I type
the list of databases I've used where that was the general idea it will go
on and on and on all day long :-)
As an aside, it was a sad day when codd died.
----- Original Message -----
From: "Bob Hall" <rjhjr@stripped>
Sent: Wednesday, September 17, 2003 8:57 PM
Subject: Re: Does Null == ""?
: Bruce Feist has initiated a discussion with me off the list, and
: has forced me to refine the way I express my position somewhat.
: The combination of on-list and off-list discussion has been interesting,
: but it's getting overwhelming, and reminds me that I have a tendency to
: overindulge in controversy. Or, more generally, I never met a keyboard
: I didn't like. (Actually, I own one that I hate, but that's another
: matter.) This will be my last contribution to this thread. I'll
: read the thread, so you can all make points at my expense, but I'll
: take a tip from Odysseus and strap myself to the mast.
: I'm not an expert on the SQL standard. Specifically, it has occured to me
: that I don't actually know whether the SQL standard defines NULL as
: "Not Known" or "Not Applicable". I've always been told that it does,
: but I can't confirm that.
: Here's what I do know. The "Interim Report 75-02-08 to the ANSI X3
: (SPARC Study Group 1975)" described fourteen possible kinds of missing
: data. Codd divided them into two groups, designated by his I-mark and
: A-mark. The ANSI SQL committee lumped them under one heading, and called
: that NULL. I think we all agree that we would have preferred Codd's
: The conventional definition of NULL, whether or not it is included in
: the SQL standard, is "Not Known" or "Not Applicable". This is both
: precise and ambiguous: Ambiguous because it has two possible meanings,
: and precise because it has only those two meanings. Of course, it can
: be further subdivided into fourteen types of missing data.
: NULL has another, operational definition which is both precise and
: unambiguous, and I can confirm that this definition is in the SQL
: standard. This definition has several parts, which define how NULL
: effects various database operations, such as SORT, SELECT, addition,
: subtraction, etc. We only need to deal with one part, from which the
: other parts can more or less be implied. That part is the requirement
: that whenever a comparison operator compares NULL with any other value,
: including itself, the operator should return UNKNOWN. Every DBMS I know
: of implements UNKNOWN as FALSE, but the standard calls for UNKNOWN.
: So an unambiguous definition of NULL, found in the SQL standard, is
: the value that always causes a comparison operator to return FALSE in
: any known DBMS.
: The problem with the unambiguous definition is that it is often the
: hardest to use. The unambiguous definition of NULL doesn't make it
: obvious why, or when, a database would contain NULL. The "Not Known"
: or "Not Applicable" definition, while ambiguous, does make it clear.
: So those of us who design databases use the ambiguous definition, and
: our computers use the unambiguous definition. This works fine, as long
: as we remember that there's no ambiguity in the way that NULL is
: implemented in a properly designed RDBMS. A computer doesn't know whether
: data is unknown or inapplicable; those concepts don't mean anything to
: it. All it cares about is whether it receives a 0 or 1 from a comparison
: To carry the anthropomorphism further, a computer doesn't care, and
: an RDBMS doesn't care, but a database lives at a higher level of
: abstraction, and it may care. For example, when I bought a radiator
: for my car, the auto parts clerk asked me if I had an automatic
: transmission. Cars that have automatic transmissions need radiators
: with extra tubes for cooling the transmission fluid. Cars that have
: manual transmissions don't need the extra tubes, and the radiators
: usually aren't interchangeable. On the other hand, I use the same
: valve lifters, regardless of the transmission type. So the auto parts
: table has to include an attribute for "goes with an automatic
: transmission". For a radiator, this would contain TRUE or FALSE.
: For a valve lifter, this would be inapplicable. If it were unknown
: instead of inapplicable, the data would have to be supplied before the
: part could be sold. So the distinction between unknown and inapplicable
: can be important, in theory.
: In practice, this rarely happens. If necessary data is missing, no
: record is created until the missing data is supplied. Parts warehouses
: don't want parts in their databases if they don't know which cars can
: use them. Usually, when someone wants to make a distinction between
: unknown and inapplicable, it's because they're trying to fit two types
: of data into a single columns, which violates the rules by which
: relational databases are constructed. For example, a bank may keep a
: record of credit limits for its customers. A database designer may be
: tempted to set aside a special number, say -1, to place in the credit
: limit column for customers who have no credit limit. The idea is to
: split NULL into its two parts, and use -1 = NULL-not-applicable, and
: NULL = NULL-not-known. But the attribute of having a credit limit is
: different information from the attribute of the credit limit size. These
: two types of data should not be in the same column. There should be
: a boolean column for credit limit existence (HasCreditLimit Boolean)
: and a currency column for the amount of the credit limit. This mistake
: is so consistent that I think of it as diagnostic; if someone is
: trying to create two types of NULLs, they're probably violating
: 1NF. Usually this happens when a database needs to store a new attribute
: that the original designer didn't anticipate, and someone gets clever
: and tries to create special flags for existing columns instead of
: creating new columns for new attributes.
: My disagreement with Bruce Feist is not over whether NULL is ambiguous;
: the definition that database designers use is explicitly ambiguous. We
: disagree over how to deal with the ambiguity. My contention is that it
: is rarely a problem in a well designed database. But I acknowledge that it
: sometimes may be a problem. Bruce advocates restricting the meaning of
: and documenting the restriction. My problem with that is that the RDBMS
: will continue to accept NULLs and process them properly, even when they
: violate the restriction. The meaning of NULL is well known, and is
: specified in the SQL standard in one form or another, so a DBA is fully
: justified in assuming that NULL is used according to the standard in
: any given database, and has no reason to look for documentation that
: tells him otherwise. NULL is any data that should cause any comparison
: to return UNKNOWN. This could be either unknown data or inapplicable data.
: This is akin to using a char or varchar column for dates. If you put
: a date into a DATE column in an invalid format, the column will spit
: the date back and complain. If you put an invalid date into a text
: column, the column will happily store it for you and all your aggregate
: queries will return inaccurate results. Similarly, if you try to use
: NULL for data that is applicable but unknown, the column will also
: accept NULL in cases where the data is inapplicable, and your assumptions
: about the meaning of NULL in that column will be wrong. The RDBMS will
: return FALSE on all comparisons with NULL, so there will be nothing to
: tell you that the column contains invalid NULLs. My experience is that
: if it is possible to put invalid data in a column, someone will do it.
: (I've done data cleanup on a subset of the payroll database for a
: multinational company. Lots of people had lots of opportunities to
: screw up the data, so they did.)
: When I've needed to specify that data in a text column is unknown, I've
: used "Unknown" or "Not Known". This has two advantages. First, it doesn't
: redefine an established term. (NULL is defined as anything that should
: a comparison to return UNKNOWN, which includes *both* unknown and
: data. This definition is widely known and accepted, and it is implemented
: the RDBMS. Changing the definition for a specific database or table is
: dangerous.) The other advantage is that it is self-documenting. Anyone
: seeing "Unknown" or "Inapplicable" will know immediately that one doesn't
: mean the other, or both.
: Numeric columns are a bigger problem. You can put a number or a NULL into
: and INT column, but not "Unknown" or "Inapplicable". In that case, the
: best solution is to allow NULLs, but add a boolean column called Unknown,
: with the restriction that both columns can't be NULL in the same row.
: So if the INT column contains a NULL, you have to specify what the NULL
: means in the Unknown column; 1 = unknown, 0 = inapplicable. If the column
: only contains non-negative integers, then you can assign -1 and -2 the
: meanings Unknown and Inapplicable. But make sure you disallow NULLs, or
: NULL will end up being used for both (as it should be if it is allowed).
: But this is coming dangerously close to cleverness, which is a bad quality
: in a database designer. These problems should occur very rarely. I'll
: that if you are trying to split NULL into its two meanings, you are
: making a design mistake, and should add columns to capture additional
: rather than shoehorn multiple attributes into one column. If you have
: to create multiple NULLs more than twice in your career, reexamine your
: Bob Hall
: MySQL General Mailing List
: For list archives: http://lists.mysql.com/mysql
: To unsubscribe: