Bob Hall wrote:
> Bruce Feist has initiated a discussion with me off the list,
Off the list by accident, by the way. I sometimes forget that in this
list I need to do a REPLY ALL. I generally don't go private unless I
want to avoid embarassing someone or need to discuss something genuinely
of no interest to others in the group; in this case, the conversation
was of general interest and reasonably civil <g>.
> The conventional definition of NULL, whether or not it is included in
> the SQL standard, is "Not Known" or "Not Applicable". This is both
> precise and ambiguous: Ambiguous because it has two possible meanings,
> and precise because it has only those two meanings.
I'll agree with that.
> an unambiguous definition of NULL, found in the SQL standard, is
> the value that always causes a comparison operator to return FALSE in
> any known DBMS.
Interesting approach. I haven't thought it through completely enough to
decide whether or not I can agree that this is an unambiguous
definition. For instance, I'm not sure what SQL is supposed to evaluate
0/0 as -- is it NULL, or does it invalidate the statement computing it,
or is it some non-NULL null-like thing? (Rhetorical question; I don't
necessarily expect an answer, although one would be nice.)
I do *not* think that it's a useful definition, though, because of the
weakness you point out. It doesn't tell the designer how to use the
feature. I cannot think of any case where it is useful to store a NULL
if you don't know what it means other than that it's a value which when
compared to other things returns UNKNOWN! In many ways, I see this as
analagous to identifying the units that a numeric field is identified
in; the field won't tell you whether it's in fortnights or seconds, so
the documentation must.
> the distinction between unknown and inapplicable
> can be important, in theory.
>
> a bank may keep a
> record of credit limits for its customers. A database designer may be
> tempted to set aside a special number, say -1, to place in the credit
> limit column for customers who have no credit limit.
> the attribute of having a credit limit is
> different information from the attribute of the credit limit size. These
> two types of data should not be in the same column. There should be
> a boolean column for credit limit existence (HasCreditLimit Boolean)
> and a currency column for the amount of the credit limit.
If you don't mind a brief red herring, consider the fact that not all
RDBMSs support CHECK constraints, and in those which don't this approach
gives rise to the possibility of inconsistent data: what happens if
HasCreditLimit is FALSE and CreditLimit is $10000?
It's a red herring, of course, because we're letting the real world
intrude upon our theoretical discussion. In any real DBMS, experienced
DBDs know that there are compromises which must be made; these don't
invalidate what we'd *like* to do, but sometimes are things that we must
do to have a well-functioning, or at least usable, system. Lack of a
CHECK constraint in the target DBMS doesn't impact what the correct
logical design is.
> This mistake
> is so consistent that I think of it as diagnostic; if someone is
> trying to create two types of NULLs,
I wasn't suggesting creating two types of NULLs... I was suggesting that
if NULLs are permitted for a column, the DBD should indicate which
meaning of NULL is in use for that column.
> My disagreement with Bruce Feist is not over whether NULL is ambiguous;
> the definition that database designers use is explicitly ambiguous. We
> disagree over how to deal with the ambiguity. My contention is that it
> is rarely a problem in a well designed database. But I acknowledge that it
> sometimes may be a problem. Bruce advocates restricting the meaning of NULL
> and documenting the restriction.
I'm not sure what you mean by "restricting". If you mean "permitting
only one of the two possible uses of NULL for a given column", then
agree... and I think that *you* think that this is a good idea as well,
given your above comments on permitting both being a violation of
normalization.
To rephrase, I'm confused about your viewpoint. On one hand you're
saying that if a field can contain both I-mark and A-mark NULLs that's a
bad design choice; on the other, you're saying that since most current
RDBMSs don't distinguish betwen I-mark and A-mark NULLs, trying to
restrict the meaning to just one is a bad design choice. You're clearly
knowlegable and have given thought to the subject, so I believe that
there's something about your point of view that I'm just not getting.
Please explain!
> My problem with that is that the RDBMS
> will continue to accept NULLs and process them properly, even when they
> violate the restriction.
> When I've needed to specify that data in a text column is unknown, I've
> used "Unknown" or "Not Known".
What if "Unknown" and "Not Known" are valid values for the column? I
think that in most cases it would be best to have a separate boolean
flag column (or two of them) if NULLs are not an option, especially if
CHECK constraints are available to ensure consistency.
Bruce Feist